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This edition is dedicated to 


Professor John Butcher 
on the occasion of his 60 th birthday 


His unforgettable lectures on Runge-Kutta methods, given in June 
1970 at the University of Innsbruck, introduced us to this subject 
which, since then, we have never ceased to love and to develop with all 
our humble abilities. 



From the Preface to the First Edition 


So far as I remember, I have never seen an Author’s Preface 
which had any purpose but one — to furnish reasons for the 
publication of the Book. (Mark Twain) 

Gauss’ dictum, “when a building is completed no one should 
be able to see any trace of the scaffolding,” is often used by 
mathematicians as an excuse for neglecting the motivation 
behind their own work and the history of their field. For¬ 
tunately, the opposite sentiment is gaining strength, and nu¬ 
merous asides in this Essay show to which side go my sym¬ 
pathies. (B.B. Mandelbrot 1982 ) 

This gives us a good occasion to work out most of the book 
until the next year. (the 

Authors in a letter, dated Oct. 29 , 1980 , to Springer-Verlag) 

There are two volumes, one on non-stiff equations, ..., the second 
on stiff equations, .... The first volume has three chapters, one on 
classical mathematical theory, one on Runge-Kutta and extrapolation 
methods, and one on multistep methods. There is an Appendix con¬ 
taining some Fortran codes which we have written for our numerical 
examples. 

Each chapter is divided into sections. Numbers of formulas, the¬ 
orems, tables and figures are consecutive in each section and indicate, 
in addition, the section number, but not the chapter number. Cross ref¬ 
erences to other chapters are rare and are stated explicitly. ... Refer¬ 
ences to the Bibliography are by “Author” plus “year” in parentheses. 
The Bibliography makes no attempt at being complete; we have listed 
mainly the papers which are discussed in the text. 

Finally, we want to thank ah those who have helped and encour¬ 
aged us to prepare this book. The marvellous “Minisymposium” 
which G. Dahlquist organized in Stockholm in 1979 gave us the first 
impulse for writing this book. J. Steinig and Chr. Lubich have read the 
whole manuscript very carefully and have made extremely valuable 
mathematical and linguistical suggestions. We also thank J.R Eck- 
mann for his troff software with the help of which the whole manu¬ 
script has been printed. For preliminary versions we had used textpro¬ 
cessing programs written by R. Menk. Thanks also to the staff of the 
Geneva computing center for their help. All computer plots have been 
done on their beautiful HP plotter. Last but not least, we would like 
to acknowledge the agreable collaboration with the planning and pro¬ 
duction group of Springer-Verlag. 


October 29 , 1986 


The Authors 



VIII Preface 


Preface to the Second Edition 

The preparation of the second edition has presented a welcome oppor¬ 
tunity to improve the first edition by rewriting many sections and by 
eliminating errors and misprints. In particular we have included new 
material on 

- Hamiltonian systems ( 1 . 14 ) and symplectic Runge-Kutta methods 
( 11 . 16 ); 

- dense output for Runge-Kutta (II.6) and extrapolation methods 

(H. 9 ); 

- a new Dormand & Prince method of order 8 with dense output 
(H. 5 ); 

- parallel Runge-Kutta methods (II. 11 ); 

- numerical tests for first- and second order systems (II. 10 and III. 7 ). 
Our sincere thanks go to many persons who have helped us with our 
work: 

- all readers who kindly drew our attention to several errors and mis¬ 
prints in the first edition; 

- those who read preliminary versions of the new parts of this edi¬ 
tion for their invaluable suggestions: D.J. Higham, L. Jay, P. Kaps, 
Chr. Lubich, B. Moesli, A. Ostermann, D. Pfenniger, P.J. Prince, 
and J.M. Sanz-Serna. 

- our colleague J. Steinig, who read the entire manuscript, for his nu¬ 
merous mathematical suggestions and corrections of English (and 
Latin!) grammar; 

- our colleague J.P. Eckmann for his great skill in manipulating 
Apollo workstations, font tables, and the like; 

- the staff of the Geneva computing center and of the mathematics 
library for their constant help; 

- the planning and production group of Springer-Verlag for numer¬ 
ous suggestions on presentation and style. 

This second edition now also benefits, as did Volume II, from the mar¬ 
vels of TpXnology. All figures have been recomputed and printed, 
together with the text, in Postscript. Nearly all computations and 
text processings were done on the Apollo DN 4000 workstation of the 
Mathematics Department of the University of Geneva; for some long¬ 
time and high-precision runs we used a VAX 8700 computer and a 
Sun IPX workstation. 


November 29 , 1992 


The Authors 
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Chapter I. Classical Mathematical Theory 


... halte ich es immer fur besser, nicht mit dem Anfang anzufan- 
gen, der immer das Schwerste ist. 

(B. Riemann copied this from F. Schiller into his notebook) 


This first chapter contains the classical theory of differential equations, which we 
judge useful and important for a profound understanding of numerical processes 
and phenomena. It will also be the occasion of presenting interesting examples of 
differential equations and their properties. 

We first retrace in Sections I.2-I.6 the historical development of classical inte¬ 
gration methods by series expansions, quadrature and elementary functions, from 
the beginning (Newton and Leibniz) to the era of Euler, Lagrange and Hamil¬ 
ton. The next part (Sections 1.7-1.14) deals with theoretical properties of the so¬ 
lutions (existence, uniqueness, stability and differentiability with respect to initial 
values and parameters) and the corresponding flow (increase of volume, preser¬ 
vation of symplectic structure). This theory was initiated by Cauchy in 1824 and 
then brought to perfection mainly during the next 100 years. We close with a brief 
account of boundary value problems, periodic solutions, limit cycles and strange 
attractors (Sections 1.15 and 1.16). 



1.1 Terminology 


A differential equation of first order is an equation of the form 

y'= f(x,y) (i-i) 

with a given function f(x, y ). A function y(x) is called a solution of this equation 
if for all x , 

y\ x ) = f{ x ,y( x ))- (i-2) 

It was observed very early by Newton, Leibniz and Euler that the solution usually 
contains a free parameter, so that it is uniquely determined only when an initial 
value 

y( x o) = Vo (i-3) 

is prescribed. Cauchy’s existence and uniqueness proof of this fact will be dis¬ 
cussed in Section 1.7. Differential equations arise in many applications. We shall 
see the first examples of such equations in Section 1.2, and in Section 1.3 how some 
of them can be solved explicitly. 

A differential equation of second order for y is of the form 

y" = f( x ,y,y')- (i-4) 

Here, the solution usually contains two parameters and is only uniquely determined 
by two initial values 

y( x o) = y\ x o) = y'o- ( L5 ) 

Equations of second order can rarely be solved explicitly (see 1.3). For their nu¬ 
merical solution, as well as for theoretical investigations, one usually sets yffx) := 
y(x), y 2 ( x ) := y'{x) , so that equation (1.4) becomes 

y[ = V2 Vi( x o) = Vo (14>) 

y' 2 = f( x ,vi,y 2 ) y 2 ( x o) = y'o- 

This is an example of a first order system of differential equations, of dimension n 
(see Sections 1.6 and 1.9), 

y'i = fi(x,y 1I ,..,y n ) yi( x o) = y w 

y' n = fn( x ’yi,---,y n ) y n ( x o) = yno- 


( 1 . 6 ) 



LI Terminology 3 


Most of the theory of this book is devoted to the solution of the initial value prob¬ 
lem for the system (1.6). At the end of the 19th century (Peano 1890) it became 
customary to introduce the vector notation 

y = (yi,---,y n ) T > / = (A, • • • JJ T 


so that (1.6) becomes y' = f(x,y) , which is again the same as (1.1), but now with 
y and / interpreted as vectors. 

Another possibility for the second order equation (1.4), instead of transforming 
it into a system (1.4’), is to develop methods specially adapted to second order 
equations (Nystrom methods). This will be done in special sections of this book 
(Sections 11.13 and III. 10). Nothing prevents us, of course, from considering (1.4) 
as a second order system of dimension n. 

If, however, the initial conditions (1.5) are replaced by something like y(x 0 ) = 
a, y(x 1 ) = 6, i.e., if the conditions determining the particular solution are not all 
specified at the same point x 0 , we speak of a boundary value problem. The theory 
of the existence of a solution and of its numerical computation is here much more 
complicated. We give some examples in Section 1.15. 

Finally, a problem of the type 


du / du d 2 u\ 
dt \ Ul dx' dx 2 ) 


(1.7) 


for an unknown function u(t,x) of two independent variables will be called a par¬ 
tial differential equation. We can also deal with partial differential equations of 
higher order, with problems in three or four independent variables, or with sys¬ 
tems of partial differential equations. Very often, initial value problems for partial 
differential equations can conveniently be transformed into a system of ordinary 
differential equations, for example with finite difference or finite element approxi¬ 
mations in the variable x. In this way, the equation 


would become 


du 2 9 2 u 
dt a dx 2 


du i 

dt 


Ax 2 


(u i+1 -2u l +u i _f 


where u^t) « u(t , xf). This procedure is called the “method of lines” or “method 
of discretization in space” (Berezin & Zhidkov 1965). We shall see in Section 1.6 
that this connection, the other way round, was historically the origin of partial dif¬ 
ferential equations (d’Alembert, Lagrange, Fourier). A similar idea is the “method 
of discretization in time” (Rothe 1930). 



1.2 The Oldest Differential Equations 


... So zum Beispiel die Aufgabe der umgekehrten Tangentenme- 
thode, von welcher auch Descartes eingestand, dass er sie nicht in 
seiner Gewalt habe. (Leibniz, 27. Aug 1676) 

... et on sait que les seconds Inventeurs n’ont pas de droit a 1’In¬ 
vention. (Newton, 29 mai 1716) 

II ne paroist point que M. Newton ait eu avant moy la characteris- 
tique & l’algorithme infinitesimal ... (Leibniz) 

And by these words he acknowledged that he had not yet found the 
reduction of problems to differential equations. (Newton) 


Newton 

Differential equations are as old as differential calculus. Newton considered them 
in his treatise on differential calculus (Newton 1671) and discussed their solution 
by series expansion. One of the first examples of a first order equation treated by 
Newton (see Newton (1671), Problema II, Solutio Casus II, Ex. I) was 

y' = 1 -3x + y + x 2 +xy. (2.1) 

For each value x and y , such an equation prescribes the derivative y' of the solu¬ 
tions. We thus obtain a vector field, which, for this particular equation, is sketched 
in Fig. 2.1a. (So, contrary to the belief of many people, vector fields existed long 
before Van Gogh). The solutions are the curves which respect these prescribed 
directions everywhere (Fig. 2.1b). 

Newton discusses the solution of this equation by means of infinite series, 
whose terms he obtains recursively (“... & ils se jettent sur les series, ou M. New¬ 
ton m’a precede sans difficult^; mais ...”, Leibniz). The first term 

y = o + ... 

is the initial value for x = 0. Inserting this into the differential equation (2.1) he 
obtains 

y'm l +... 

which, integrated, gives 

y = x + .... 

Again, from (2.1), we now have 

y' = l — 3x + x + ... = l — 2x + ... 

y = x — x 2 + - 


and by integration 
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Fig. 2.1. a) vector field, b) various solution curves of equation (2.1), 
c) Correct solution vs. approximate solution 


The next round gives 

x 3 

y' = l-2x + x 2 + ..., y = x-x 2 + — + - 

o 

Continuing this process, he finally arrives at 

y = x - xx + \x* - \x A + 2-x 5 - -j-x 6 ; he. (2.2) 

3 6 30 45 

These approximations, term after term, are plotted in Fig. 2.1c together with the 
correct solution. It can be seen that these approximations are closer and closer 
to the true solution for small values of x. For more examples see Exercises 1-3. 
Convergence will be discussed in Section 1.8. 






6 I. Classical Mathematical Theory 

Leibniz and the Bernoulli Brothers 


A second access to differential equations is the consideration of geometrical prob¬ 
lems such as inverse tangent problems (Debeaune 1638 in a letter to Descartes). A 
particular example describes the path of a silver pocket watch (“horologio porta- 
bili suae thecae argentae”) and was proposed around 1674 by “Claudius Perraltus 
Medicus Parisinus” to Leibniz: a curve y(x) is required whose tangent AB is 
given, say everywhere of constant length a (Fig. 2.2). This leads to 


\A 2 


(2.3) 


a first order differential equation. Despite the efforts of the “plus celebres mathe- 
maticiens de Paris et de Toulouse” (from a letter of Descartes 1645, “Toulouse” 
means “Fermat”) the solution of these problems had to wait until Leibniz (1684) 
and above all until the famous paper of Jacob Bernoulli (1690). Bernoulli’s idea 
applied to equatio n (2.3) is as follows: let the curve BM in Fig. 2.3 be such that 
LM is equal to s/a 2 — y 2 /y . Then (2.3), written as 


dx= _^EES_ d (2.3') 

y 

shows that for all y the areas S 1 and S 2 (Fig. 2.3) are the same. Thus (“Ergo & 
horum integralia aequantur”) the areas BMLB and A 1 A 2 C 2 C 1 must be equal 
too. Hence (2.3’) becomes (Leibniz 1693) 


/ 

j V 




dy = -a /« 2 - y 2 - a ■ log 


; - v / a 2 -y 2 


(2.3”) 




Fig. 2.2. Illustration from 
Leibniz (1693) 


Fig. 2.3. Jac. Bernoulli’s 
Solution of (2.3) 
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Variational Calculus 

In 1696 Johann Bernoulli invited the brightest mathematicians of the world (“Pro- 
fundioris in primis Mathesos cultori, Salutem!”) to solve the brachystochrone 
(shortest time) problem, mainly in order to fault his brother Jacob, from whom 
he expected a wrong solution. The problem is to find a curve y(x) connecting two 
points P 0 , P x , such that a point gliding on this curve under gravitation reaches P x 
in the shortest time possible. In order to solve his problem, Joh. Bernoulli (1697b) 
imagined thin layers of homogeneous media and knew from optics (Fermat’s prin¬ 
ciple) that a light ray with speed v obeying the law of Snellius 

sum = Kv 

passes through in the shortest time. Since the speed is known to be proportional to 
the square root of the fallen height, he obtains, by passing to thinner and thinner 
layers, 

sin a = - 1 = K^2g(y-h), (2.4) 

Vi + y' 

a differential equation of the first order. 



Fig. 2.4. Solutions of the variational problem (Joh. Bernoulli, 

Jac. Bernoulli, Euler) 

The solutions of (2.4) can be shown to be cycloids (see Exercise 6 of Sec¬ 
tion 1.3). Jacob, in his reply, also furnished a solution, much less elegant but unfor¬ 
tunately correct. Jacob’s method (see Fig. 2.4) was something like today’s (inverse) 
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“finite element” method and more general than Johann’s and led to the famous work 
of Euler (1744), which gives the general solution of the problem 



X, y, 2/0 dx 


= min 


with the help of the differential equation of the second order 


(2.5) 


Fy(x, y , 2/0 - 2- (F y ,(x, y , 2/0) = F y - F y , y ,y" - F y , y y' - F y , x = 0, (2.6) 

and treated 100 variational problems in detail. Equation (2.6), in the special case 
where F does not depend on x , can be integrated to give 


F ~ F ,y' = K. 


(2.6’) 


Euler’s original proof used polygons in order to establish equation (2.6). Only the 
ideas of Lagrange, in 1755 at the age of 19, led to the proof which is today the usual 
one (letter of Aug. 12, 1755; Oeuvres vol. 14, p. 138): add an arbitrary “variation” 
Sy(x) to y(x) and linearize (2.5). 


f 


(2.7) 


F(x,y + 5y, y' + {Sy)') dx 

= [ F(x,y,y')dx+ f (F y {x,y,y')Sy + F y ,{x,y,y'){Sy)'^dx + ... 
Jx 0 Jxq V 7 


J X o J Xq 

The last integral in (2.7) represents the “derivative” of (2.5) with respect to Sy . 
Therefore, if y{x) is the solution of (2.5), we must have 

rooi 

[F y (x,y,y')5y + F y ,(x,y,y')(6yYj dx = 0 (2.8) 


f 


or, after partial integration, 

f X l / d \ 

\F v {x,y,y') - — F y ,(x,y,y')J ■Sy(x)dx = 0. 


f 


(2.8’) 


(2.9) 


Since (2.8’) must be fulfilled by all Sy, Lagrange “sees” that 

F y (x, y, 2/0 — 2- F y , (x, y, y') = 0 

is necessary for (2.5). Euler, in his reply (Sept. 6 , 1755) urged a more precise proof 
of this fact (which is now called the “fundamental Lemma of variational Calculus”). 
For several unknown functions 


J F(x,y 1 ,y[,,...,y n ,y' n )dx = mh 


( 2 . 10 ) 


the same proof leads to the equations 

F Vi (x, y^y'i,..., y n , y'J - 2- F y , (x, y^y'^..., y n , y' n ) = 0 


( 2 . 11 ) 
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for i = 1,..., n. Euler (1756) then gave, in honour of Lagrange, the name “Varia¬ 
tional calculus” to the whole subject (“... tamen gloria primae inventionis acutis- 
simo Geometrae Taurinensi La Grange erat reservata”). 


Clairaut 

A class of equations with interesting properties was found by Clairaut (see Clairaut 
(1734), Probleme III). He was motivated by the movement of a rectangular wedge 
(see Lig. 2.5), which led him to differential equations of the form 

y-xy' + f(y')= 0. (2.12) 

This was the first implicit differential equation and possesses the particularity that 
not only the lines y = Cx — f(C) are solutions, but also their enveloping curves 
(see Exercise 5). An example is shown in Lig. 2.6 with f(C ) = 5(C 3 — C)/2. 



Since the equation is of the third degree in y ', a given initial value may allow 
up to three different solution lines. Lurthermore, where a line touches an envelop¬ 
ing curve, the solution may be continued either along the line or along the curve. 
There is thus a huge variety of different possible solution curves. This phenomenon 
attracted much interest in the classical literature (see e.g., Exercises 4 and 6). To¬ 
day we explain this curiosity by the fact that at these points no Lipschitz condition 
is satisfied (see also Ince (1944), p. 538-539). 
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Fig. 2.6. Solutions of a Clairaut differential equation 


Exercises 


1. (Newton). Solve equation (2.1) with another initial value y( 0) = 1. 

Newton’s result: y = 1 + 2x + x 3 + |x 4 + &c. 

2. (Newton 1671, “Problema II, Solutio particulare”). Solve the total differential 
equation 

3x 2 — 2 ax + ay — 3 y 2 y' + axy' = 0. 

Solution given by Newton: x 3 — ax 2 + axy — y 3 = 0. Observe that he missed 
the arbitrary integration constant C . 


3. (Newton 1671). Solve the equations 

. , _, . V xy . x 2 y x 3 y 

a ) y — H—t —T ^—7" ^— 4~ ’ ^ c * 

a a z a 6 a 4 

b) y' = -3x + 3xy + y 2 - xy 2 + y 3 - xy 3 + y 4 - xy 4 + 6 x 2 y 


— 6x 2 + 8 x 3 y — 8x 3 + 10 x 4 y — 10a: 4 , &c. 

Results given by Newton: 


a) 

b) 


nr* 2 ry* 3 ry* 4 /y>5 /y»6 

th tT tAy *Ay th _ 

y = :C+ 2^ + 2^ + 2^ + 2^ + 2^’ &C - 


25 


91 


111 


367 7 


y = --x z - 2x 3 - —x 4 - —x 5 - — x e - —x 7 , &c. 

y 2 8 20 16 35 
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4. Show that the differential equation 

x + yy' = yW x<2 + y 2 1 

possesses the solutions 2at/ = a 2 + 1 — x 2 for all a . Sketch these curves and 
find yet another solution of the equation (from Lagrange (1774), p. 7, which 
was written to explain the “Clairaut phenomenon”). 

5. Verify that the envelope of the solutions y = Cx — f(C) of the Clairaut equa¬ 
tion (2.12) is given in parametric representation by 

x (p) = f(p) 
y(p)=pf'(p)- f(p) • 

Show that this envelope is also a solution of (2.12) and calculate it for f(C) = 
5(C 3 -C)/2 (cf. Fig. 2.6). 

6. (Cauchy 1824). Show that the family y = C(x + C) 2 satisfies the differential 
equation (t/) 3 = Sy 2 — 4 xyy '. Find yet another solution which is not included 
in this family (see Fig. 2.7). 

Answer: y = — ^x 3 . 



Fig. 2.7. Solution family of Cauchy’s example in Exercise 6 




1.3 Elementary Integration Methods 


We now discuss some of the simplest types of equations, which can be solved by 
the computation of integrals. 

First Order Equations 

The equation with separable variables. 

y' = f(x)g(y)- (3.1) 

Extending the idea of Jacob Bernoulli (see (2.3’)), we divide by g(y ), integrate and 
obtain the solution (Leibniz 1691, in a letter to Huygens) 

i wr i nx)dx+a 

A special example of this is the linear equation y' = f(x)y , which possesses the 
solution 

y(x) = CR(x ), R(x) = ex.p(^f f(x)dx^j. 


The inhomogeneous linear equation. 

y' = f(x)y + g(x). (3.2) 

Here, the substitution y(x) = c(x)R{x) leads to d{pc) = g{x)/R{x) (Joh. Bernoulli 
1697). One thus obtains the solution 

y(x) = R(x)[J* j^ds + c). (3.3) 

Total differential equations. An equation of the form 

P(x,y)+Q(x,y)y' = 0 

is found to be immediately solvable if 

dP dQ 


(3.4) 
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One can then find by integration a potential function U(x,y) such that 


dy 


Therefore (3.4) becomes y(x)) = 0, so that the solutions can be expressed 

by U(x, y{x)) — C. For the case when (3.5) is not satisfied, Clairaut and Euler 
investigated the possibility of multiplying (3.4) by a suitable factor M(x,y), which 
sometimes allows the equation MP + MQy' = 0 to satisfy (3.5). 


Second Order Equations 


Even more than for first order equations, the solution of second order equations by 
integration is very seldom possible. Besides linear equations with constant coeffi¬ 
cients, whose solutions for the second order case were already known to Newton, 
several tricks of reduction are possible, as for example the following: 

For a linear equation 

y" = a{x)y' + b(x)y 

we make the substitution (Riccati 1723, Euler 1728) 


V = exp 


p{x) dxj. 


(3.6) 


The derivatives of this function contain only derivatives of p of lower order 

y' = p- exp( Jp(x) dx'j, y" = (p 2 + p') -exp^Jp(x)dx^j 

so that inserting this into the differential equation, after division by y , leads to a 
lower order equation 

p 2 +p' = a(x)p + b{x) (3.7) 

which, however, is nonlinear. 

If the equation is independent of y, y" = f(x, y') , it is natural to put y' = v 
which gives v' = f(x, v) . 

An important case is that of equations independent of x: 

y" = f(y,y')- 

Here we consider y' as function of y: y' =p(y) . Then the chain rule gives y" = 
p'p = f(y,p) , which is a first order equation. When the function p(y) has been 
found, it remains to integrate y' = p(y) , which is an equation of type (3.1) (Riccati 
(1712): “Per liberare la premessa formula dalle seconde differenze,..., chiamo p 
la sunnormale BF ... ”, see also Euler (1769), Problema 96, p. 33). 

The investigation of all possible differential equations which can be integrated 
by analytical methods was begun by Euler. His results have been collected, in 
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more than 800 pages, in Volumes XXII and XXIII of Euler’s Opera Omnia. For a 
more recent discussion see Ince (1944), p. 16-61. An irreplaceable document on 
this subject is the book of Kamke (1942). It contains, besides a description of the 
solution methods and general properties of the solutions, a systematically ordered 
list of more than 1500 differential equations with their solutions and references to 
the literature. 

The computations, even for very simple looking equations, soon become very 
complicated and one quickly began to understand that elementary solutions would 
not always be possible. It was Liouville (1841) who gave the first proof of the 
fact that certain equations, such as y' = x 2 + y 2 , cannot be solved in terms of 
elementary functions. Therefore, in the 19th century mathematicians became more 
and more interested in general existence theorems and in numerical methods for 
the computation of the solutions. 


Exercises 


1. Solve Newton’s equation (2.1) by quadrature. 

2. Solve Leibniz’ equation (2.3) in terms of elementary functions. 

Hint. The integral for y might cause trouble. Use the substitution a 2 — y 2 = 

u 2 , —ydy = udu . 

3. Solve and draw the solutions of y' = f(y) where f(y) = y/\ ~y\. 

4. Solve the master-and-dog problem: a dog runs with speed w in the direction 
of his master, who walks with speed v along the y- axis. This leads to the 
differential equation 

( X y>y = -^ TTW- 


5. Solve the equation my" = — k/y 2 , which describes a body falling according 
to Newton’s law of gravitation. 


6. Verify that the cycloid 

x — x 0 = R (t — sin t) , 


y — h = R (1 — cost), 


R = 


1 

4 gK 2 


satisfies the differential equation (2.4) for the brachystochrone problem. Solv¬ 
ing (2.4) in a forward manner, one arrives after some simplifications at the 
integral 

which is computed by the substitution y = (sint) 2 . 
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7. Reduce the “Bernoulli equation” (Jac. Bernoulli 1695) 

y' + f{x)y = g{x)y n 

with the help of the coordinate transformation z(x) = (y(x))^ and a suit¬ 
able choice of q, to a linear equation (Leibniz, Acta Erud. 1696, p. 145, Joh. 
Bernoulli, Acta Erud. 1697, p. 113). 

8. Compute the “Linea Catenaria” of the hanging rope. The solution was given 
by Joh. Bernoulli (1691) and Leibniz (1691) (see Fig. 3.2) without any hint. 

Hint. (Joh. Bernoulli, “Lectiones ... in usum Ill. Marchionis Hospitalii” 
1691/92). Let H resp. V be the horizontal resp. vertical component of the 
tension in the rope (Fig. 3.1). Then H — a is a constant and V = q • s is pro¬ 
portional to the arc length. This leads to Cp = s or Cdp — ds i.e., Cdp = 



Fig. 3.1. Solution of the 
Catenary problem 


Fig. 3.2. “Linea Catenaria” 
drawn by Leibniz (1691) 


i'.V) J.V) 





1.4 Linear Differential Equations 


Lisez Euler, lisez Euler, c’est notre maitre a tous. (Laplace) 

[Euler] ... c’est un homme peu amusant, mais un tres-grand Geo¬ 
metre. (D’Alembert, letter to Voltaire, March 3, 1766) 

[Euler] ... un Geometre borgne, dont les oreilles ne sont pas faites 
pour sentir les delicatesses de la poesie. 

(Frederic II, in a letter to Voltaire) 


Following in the footsteps of Euler (1743), we want to understand the general so¬ 
lution of nth order linear differential equations. We say that the equation 

£(.V) '■= a n (x)y {n) + a n _ 1 (x)y < ' n ~ 1 ' > + ... + a 0 (x)y = 0 (4.1) 

with given functions a 0 (x ),..., a n (x) is homogeneous. If n solutions u x (x), 
..., u n (x) of (4.1) are known, then any linear combination 

y(x) = C 1 u 1 (x) + ... + C n u n (x) (4.2) 

with constant coefficients C l5 ..., C n is also a solution of (4.1), since all deriva¬ 
tives of y appear only linearly in (4.1). 


Equations with Constant Coefficients 

Let us first consider the special case 

y (n) (x) = 0. (4.3) 

This can be integrated once to give y( n_1 ) (x) = C 1 , then y( n ~ 2 ) (x) = C 1 x + C 2 , 
etc. Replacing at the end the arbitrary constants C i by new ones, we finally obtain 

y(x) = C^x 71 1 + C^x 71 2 + ... + C n . 

Thus there are n “free parameters” in the “general solution” of (4.3). Euler’s in¬ 
tuition, after some more examples, also expected the same result for the general 
equation (4.1). This fact, however, only became completely clear many years later. 
We now treat the general equation with constant coefficients, 

y^+A n _ iy ^ + ... + A 0 y = 0. (4.4) 

Our problem is to find a basis of n linearly independent solutions u 1 (x ),..., 
u n (x). To this end, Euler’s inspiration was guided by the transformation (3.6), 
(3.7) above: if a(pc) and b(x) are constants, we assume p constant in (3.7) so that 
p' vanishes, and we obtain the quadratic equation p 2 =ap + b. For any root of this 
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equation, (3.6) then becomes y = eP x . In the general case we thus assume y = eP x 
with an unknown constant p , so that (4.4) leads to the characteristic equation 

p n +A n _ 1 p n ~ 1 + ... + A 0 = 0. (4.5) 

If the roots p 1 ,...,p n of equation (4.5) are distinct, all solutions of (4.4) are given 
by 

y(x) = C 1 e PlX +... + C n e p " x . (4.6) 

It is curious to see that the “brightest mathematicians of the world” struggled for 
many decades to find this solution, which appears so trivial to today’s students. 

A difficulty arises with the solution (4.6) when (4.5) does not possess n distinct 
roots. Consider, with Euler, the example 

y" -2qy' + q 2 y = 0. (4.7) 

Here p — q is a double root of the corresponding characteristic equation. If we set 

y = e« x u, (4.8) 

(4.7) becomes u" — 0, which brings us back to (4.3). So the general solution of 
(4.7) is given by y(pc) = + C 2 ) (see also Exercise 5 below). After some 

more examples of this type, one sees that the transformation (4.8) effects a shift of 
the characteristic polynomial, so that if q is a root of multiplicity k , we obtain for 
u an equation ending with ... + Bu^ k+1 ^> + CuW = 0. Therefore 

e o x (C 1 x k - 1 + ... + C k ) 

gives us k independent solutions. 

Finally, for a pair of complex roots p = a±i/3 the solutions e( a+i ^) x , e( a_ ^) x 
can be replaced by the real functions 

e ax (C 1 cos (3x + C 2 sin (3x ). 

The study of the inhomogeneous equation 

£{y) = /O) (4.9) 

was begun in Euler (1750), p. 13. We mention from this work the case where f(x) 
is a polynomial, say for example the equation 

Ay" + By' + Cy = ax 2 + bx + c. (4.10) 

Here Euler puts y{x) = Ex 2 + Fx + G + v(x ). Inserting this into (4.10) and elim¬ 
inating all possible powers of x , one obtains 

CE = a, CF + 2 BE = b , CG + BF + 2 AE = c, 

Av" + BE + Cvm 0. 

This allows us, when C is different from zero, to compute E, F and G and we 
observe that the general solution of the inhomogeneous equation is the sum of a 
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particular solution of it and of the general solution of the corresponding homoge¬ 
neous equation. This is also true in the general case and can be verified by trivial 
linear algebra. 

The above method of searching for a particular solution with the help of un¬ 
known coefficients works similarly if f(x) is composed of exponential, sine, or 
cosine functions and is often called the “fast method”. We see with pleasure that it 
was historically the first method to be discovered. 


Variation of Constants 

The general treatment of the inhomogeneous equation 

a n(. x )y {n) + • • ■ + a Q {x)y = f(x) (4.11) 

is due to Lagrange (1775) (“... par une nouvelle methode aussi simple qu’on 
puisse le desirer”, see also Lagrange (1788), seconde partie, Sec. V.) We assume 
known n independent solutions u 1 (x ),..., u n (x) of the homogeneous equation. 
We then set, in extension of the method employed for (3.2), instead of (4.2) 

y(x) = c^u^x) + .. • + c n (x)u n (x) (4.12) 

with unknown functions c-(x) (“method of variation of constants”). We have to 
insert (4.12) into (4.11) and thus compute the first derivative 

n n 

y' = Y, C i U i + J2 C i U i- 

i= 1 i= 1 

If we continue blindly to differentiate in this way, we soon obtain complicated and 
useless formulas. Therefore Lagrange astutely requires the first term to vanish and 
puts 

n 

^ =0 7=0, then also for j = 1,..., n — 2. (4.13) 

z=1 

Then repeated differentiation of y , with continued elimination of the undesired 
terms (4.13), gives 

n n 

/ \ A / (n — 1) \ ^ (n— 1) 

y=2^ c i u n ••• y ~2^ c i u i > 

i—\ i— 1 

n n 

=Y,‘>‘r 1> +H 

i= 1 i=l 

If we insert this into (4.11), we observe wonderful cancellations due to the fact 
that the u i ( x) satisfy the homogeneous equation, and finally obtain, together with 
(4.13), 
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U n \ 

( c }\ 


/ 0 \ 

< 







0 

t4" _1) / 



\f( x )/ a n(x)J 


(4.14) 


This is a linear system, whose determinant is called the “Wronskian” and whose 
solution yields c[(x ),..., c' n (x) and after integration c 1 (x),..., c n (x ). 

Much more insight into this formula will be possible in Section 1.11. 


Exercises 


1. Find the solution “huius aequationis differentialis quarti gradus” a Ay(A)j r y = 
0 , a 4 ?/ 4 ) — y = 0 ; solve the equation “septimi gradus” yW +y ( 5 ) + ?/( 4 ) + 
2 /( 3 ) + 7 /( 2 ) + ^ = o. (Euler 1743, Ex. 4,5,6). 

2. Solve by Euler’s technique y" — 3 y' — 4y = cos x and y" + y = cos x . 

Hint. In the first case the particular solution can be searched for in the form 
E cos x + F sin x . In the second case (which corresponds to a resonance in the 
equation) one puts Ex cos x + Ex sin x just as in the solution of (4.7). 

3. Find the solution of 

V -3y -Ay = g{x), g{x) = | q 

such that 7/(0) = 2 /'( 0 ) = 0, 

a) by using the solution of Exercise 2, 

b) by the method of Lagrange (variation of constants) 

4. (Reduction of the order if one solution is known). Suppose that a nonzero 
solution u 1 (x) of y" + a 1 (x)y' + a 0 (x)t/ = 0 is known. Show that a second 
independent solution can be found by putting u 2 (x) = c(x)u 1 (x ). 

5. Treat the case of multiple characteristic values (4.7) by considering them as a 
limiting case p 2 —> p x and using the solutions 

e p 2 x _ e p!x deP lX 

Ui (x) = e PlX , u 9 (x) = lim -= —-, etc. 

P2^Pl P 2 ~Pl Op 1 

(d’Alembert (1748), p. 284: “Enfin, si les valeurs de p & de p' sont egales, 
au lieu de les supposer telles, on supposera p = a + a,p' = a — a, a etant 
quantite infiniment petite ...”). 


0 < x < 7^/2 
7t/2 < X 




1.5 Equations with Weak Singularities 


Der Mathematiker weiss sich ohnedies beim Auftreten von singu- 
laren Stellen gegebenenfalls leicht zu helfen. (K. Heun 1900) 


Many equations occurring in applications possess singularities , i.e., points at which 
the function f(x,y) of the differential equation becomes infinite. We study in some 
detail the classical treatment of such equations, since numerical methods, which 
will be discussed later in this book, often fail at the singular point, at least if they 
are not applied carefully. 


Linear Equations 


As a first example, consider the equation 

, q + bx 

y =- y , 

X 


< 7^0 


(5.1) 


with a singularity at x = 0. Its solution, using the method of separation of variables 
(3.1), is 

y{x) = Cx q e bx = C(pc q + bx q+1 + ...). (5.2) 


These solutions are plotted in Fig. 5.1 for different values of q and show the fun¬ 
damental difference in the behaviour of the solutions in dependence of q. 



Fig. 5.1. Solutions of (5.1) for b = 1 
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Euler started a systematic study of equations with singularities. He asked 
which type of equation of the second order can conveniently be solved by a se¬ 
ries as in (5.2) (Euler 1769, Problema 122, p. 177, .. quas commode per series 

resolvere licet...”). He found the equation 

Cy :=x 2 (a + bx)y" + x(c + ex)y' + (/ + gx)y = 0 . (5.3) 

Let us put y = x q (A 0 + A 1 x + A 2 x 2 + ...) with A 0 ^ 0 and insert this into (5.3). 
We observe that the powers x 2 and x which are multiplied by y" and y ', re¬ 
spectively, just re-establish what has been lost by the differentiations and obtain by 
comparing equal powers of x 

(q(q-l)a + qc +f^jA o =0 (5.4a) 

((q+i)(q+i- 1)« + (q+i)c + /) ^ (5.4b) 

= — 1 )( ( 7 +* — 2 ) 6 + (q+i—l)e + g^A i _ 1 

for i = 1, 2, 3,... . In order to get A 0 ^ 0, q has to be a root of the index equation 

X(q) ■■ =q(q^l)a + qc +f = 0. (5.5) 

For a 7 ^ 0 there are two characteristic roots q x and q 2 of (5.5). Since the left-hand 
side of (5.4b) is of the form x(q + i)A i = ..., this relation allows us to compute 
A x , A 2 , A 3 ,... at least for q 1 (if the roots are ordered such that Re q 1 > Re q 2 ). 
Thus we have obtained a first non-zero solution of (5.3). A second linearly inde¬ 
pendent solution for q — q 2 is obtained in the same way if q x — q 2 is not an integer. 

Case of double roots. Euler found a second solution in this case with the inspi¬ 
ration of some acrobatic heuristics (Euler 1769, p. 150: “... quod aequivaleat 
ipsi Ixx ...”). Fuchs (1866, 1868) then wrote a monumental paper on the form 
of ah solutions for the general equation of order n , based on complicated calcu¬ 
lations. A very elegant idea was then found by Frobenius (1873): fix A 0 , say as 
A 0 (q) = 1 , completely ignore the index equation, choose q arbitrarily and consider 
the coefficients of the recursion (5.4b) as functions of q to obtain the series 

oo 

y(x,q) = x q '^ j A i {q)x\ (5.6) 

i =0 

whose convergence is discussed in Exercise 8 below. Since all conditions (5.4b) 
are satisfied, with the exception of (5.4a), we have 

Cy(x,q) = x(q)x g . (5.7) 

A second independent solution is now found simply by differentiating (5.7) with 
respect to q : 

=x(q)-iogx-x g + x'(q)-x g . 


(5.8) 
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If we set q = q 1 

p. oo 

f (x, q x ) = log x ■ y(x, q x ) + x qi ^ A' i (q 1 )x t , (5.9) 

^ i =0 

we obtain the desired second solution since x(<h) = X f (Qi) = 0 (remember that q x 
is a double root of %). 

The case q x — q 2 = m E Z, m > 1. In this case we define a function z(x) by 
satisfying A 0 (g) = 1 and the recursion (5.4b) for all i with the exception of i = m. 
Then 

Cz = X (q)x q + Cx q+rn (5.10) 

where C is some constant. For q = q 2 the first term in (5.10) vanishes and a 
comparison with (5.8) shows that 

X'{q 1 )z(x)-C^(x,q 1 ) (5.11) 

is the required second solution of (5.3). 


Euler (1778) later remarked that the formulas obtained become particularly 
elegant, if one starts from the differential equation 


x(l — x)y" + (c — (a + b + 1 )x)y r — aby = 0 

instead of from (5.3). Here, the above method leads to 


q(q - 1) + cq = 0, q 1 = 0, 
(q + z)(6 + z) 

i+1 (c + t)(l + *) ' 


q 2 = 1 - c > 

for = 0. 


(5.12) 

(5.13) 

(5.14) 


The resulting solutions, later named hypergeometric functions, became particularly 
famous throughout the 19th century with the work of Gauss (1812). 

More generally, the above method works in the case of a differential equation 

x 2 y" + xa{x)y r + b(x)y = 0 (5.15) 


where a{x) and b(pc) are regular analytic functions. One then says that 0 is a 
regular singular point. Similarly, we say that the equation (x — x 0 ) 2 y" + (x — 
x 0 )a(x)y f + b{x)y = 0 possesses the regular singular point x 0 . In this case solu¬ 
tions can be obtained by the use of algebraic singularities {x — xf)<i • 

Finally, we also want to study the behaviour at infinity for an equation of the 
form 

a{x)y" + b(x)y' + c{x)y = 0. (5.16a) 


For this, we use the coordinate transformation t = 1/x, z(t) = 2 /(x) which yields 

‘ 4 “(i) z " + ( 2 ‘ 3 “(i)- f 2 K?)) z ' +c (i) 2=0 - 


(5.16b) 
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oc is called a regular singular point of (5.16a) if 0 is a regular singular point of 
(5.16b). For examples see Exercise 9. 


Nonlinear Equations 


For nonlinear equations also, the above method sometimes allows one to obtain, if 
not the complete series of the solution, at least a couple of terms. 


EXEMPLUM. Let us see what happens if we try to solve the classical brachys- 
tochrone problem (2.4) by a series. We suppose h = 0 and the initial value y( 0) = 
0. We write the equation as 

(y') 2 =--l or y(y') 2 + y = L. (5.17) 

y 

At the initial point t/(0) = 0, y' becomes infinite and most numerical methods 
would fail. We search for a solution of the form y = A 0 xQ . This gives in (5.17) 
q 2 A^x^~ 2 + A 0 xQ = L. Due to the initial value we have that y{pc) becomes neg¬ 
ligible for small values of x . We thus set the first term equal to L and obtain 
3q — 2 = 0 and q 2 Aq = L . So 

(5.18) 

is a first approximate solution. The idea is now to use (5.18) just to escape from the 
initial point with a small x , and then to continue the solution with any numerical 
step-by-step procedure from the later chapters. 

A more refined approximation could be tried in the form y = A^x^ + A 1 x^ , + r . 
This gives with (5.17) 

q 2 A 3 0 x 3q - 2 + q(3q + 2r)A 2 0 A lX 3q+r - 2 + A 0 x q + ... = L. 


We use the second term to neutralize the third one, which gives 3q + r — 2 = q or 
r = q = 2/3 and 5 q 2 A 0 A 1 = — 1. Therefore 



1/3 


(5.19) 


is a better approximation. The following numerical results illustrate the utility of 
the approximations (5.18) and (5.19) compared with the correct solution y{pc) from 
1.3, Exercise 6, with L — 2: 

x = 0.10 y(x) = 0.342839 u(x) = 0.355689 v(x) = 0.343038 
x = 0.01 y(x) = 0.076042 u(x) = 0.076631 v(x) = 0.076044. 
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Exercises 


1. Compute the general solution of the equation x 2 y" + xy' + gx n y = 0 with g 
constant (Euler 1769, Problema 123, Exemplum 1). 

2. Apply the technique of Euler to the Bessel equation 

x 2 y" + xy' + ( x 2 - g 2 )y = 0. 

Sketch the solutions obtained for g = 2/3 and g = 10/3. 

3. Compute the solutions of the equations 

x 2 y" — 2 xy' + y = 0 and x 2 y" — 3 xy' + 4y = 0. 

Equations of this type are often called Euler’s or even Cauchy’s equation. Its 
solution, however, was already known to Joh. Bernoulli. 


4. (Euler 1769, Probl. 123, Exempl. 2). Let 

r 


y(x) = / Vsi 
Jo 


sin 2 s + x 2 cos 2 s ds 


be the perimeter of the ellipse with axes 1 and x < 1. 

a) Verify that y(x) satisfies the differential equation 

x(l - x 2 )y" — (1 + x 2 )y' + xy = 0. 


(5.20) 


b) Compute the solutions of this equation. 

c) Show that the coordinate change x 2 = t , y{pc) = z(t) transforms (5.20) to 
a hypergeometric equation (5.12). 

Hint. The computations for a) lead to the integral 


Jo 


1 — 2 cos 2 s + q 2 cos 4 s 


ds , 


2 i 2 

q = 1 — x 


j o (1 — q 2 cos 2 s) 3 / 2 

which must be shown to be zero. Develop this into a power series in q 2 . 


5. Try to solve the equation 

x 2 y" + (3x - 1 )y f + y = 0 

with the help of a series (5.6) and study its convergence. 


6. Find a series of the type 

y # A 0 x q + A 1 x 9+S + A 2 ^ +2s + ... 

which solves the nonlinear “Emden-Fowler equation” of astrophysics 
(. x 2 y + y 2 x~ 1 / 2 = 0 in the neighbourhood of x = 0. 
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7. Approximate the solution of Leibniz’s equation (2.3) in the neighbourhood of 
the singular initial value y(0) m a by a function of the type y(pc) — a — CxQ . 
Compare the result with the correct solution of Exercise 2 of 1.3. 

8. Show that the radius of convergence of series (5.6) is given by 

i) r = \a/b\ ii) r = 1 

for the coefficients given by (5.4) and (5.14), respectively. 

9. Show that the point oo is a regular singular point for the hypergeometric equa¬ 
tion (5.12), but not for the Bessel equation of Exercise 2. 

10. Consider the initial value problem 

y'=^y + g(x), y{ 0) = 0. (5.21) 

a) Prove that if A < 0, the problem (5.21) possesses a unique solution for 

x > 0; 

b) If g(pc) is A;-times differentiable and A < 0, then the solution y(x) is 
(fc + 1) -times differentiable for x > 0 and we have 

J/W(0)=(l-y) 0 (J-1) (O), j= 1 , 2 ,.... 
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En general on peut supposer que 1’Equation differentio-differentielle de 
la Courbe ADE est ipdt 2 = =b dde ... (d’Alembert 1743, p. 16) 

Parmi tant de chefs-d’oeuvre que l’on doit a son genie [de Lagrange], sa 
Mecanique est sans contredit le plus grand, le plus remarquable et le plus 
important. (M. Delambre, Oeuvres de Lagrange, vol. 1, p. XLIX) 


Newton (1687) distilled from the known solutions of planetary motion (the Ke¬ 
pler laws) his “Lex secunda” together with the universal law of gravitation. It was 
mainly the “Dynamique” of d’Alembert (1743) which introduced, the other way 
round, second order differential equations as a general tool for computing mechan¬ 
ical motion. Thus, Euler (1747) studied the movement of planets via the equations 
in 3-space 


mn 


d 2 x 
dt 2 


x m < EL = Y m— = Z 

X, m dt 2 Y, df2 Z, 


( 6 . 1 ) 


where X, Y, Z are the forces in the three directions. (“... & par ce moyen j’evite 
quantite de recherches penibles”). 


The Vibrating String and Propagation of Sound 

Suppose a string is represented by a sequence of identical and equidistant mass 
points and denote by y 1 (t) , y 2 (t) ,... the deviation of these mass points from 
the equilibrium position (Fig. 6.1a). If the deviations are supposed small (“fort 
petites”), the repelling force for the i-th mass point is proportional to — y i _ 1 + 
2 y i — y i+1 (Brook Taylor 1715, Johann Bernoulli 1727). Therefore equations (6.1) 
become 

Vi = X 2 (—2y 1 + y 2 ) 
y^ = K 2 { yi -2y 2 + y 3 ) 

( 6 . 2 ) 

y , ; = K 2 {y n - 1 -2y n ). 

This is a system of n linear differential equations. Since the finite differences 

r\2 

2 / i _ 1 — 2 y i + y i+1 ~ equation (6.2) becomes, by the “inverse” method of 

lines, the famous partial differential equation (d’Alembert 1747) 

d 2 u 2 
dt 2 a dx 2 


for the vibrating string. 
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The propagation of sound is modelled similarly (Lagrange 1759): we suppose 
the medium to be a sequence of mass points and denote by y 1 (t ), y 2 (t ),... their 
longitudinal displacements from the equilibrium position (see Fig. 6.1b). Then by 
Hooke’s law of elasticity the repelling forces are proportional to the differences 
of displacements — yf) — (y i — Vi+i) • This leads to equations (6.2) again 
(“En examinant les equations,... je me suis bientot aperqu qu’elles ne differaient 
nullement de celles qui appartiennent au probleme de chordis vibrantibus ... ”). 



Another example, treated by Daniel Bernoulli (1732) and by Lagrange (1762, 
Nr. 36), is that of mass points attached to a hanging string (Fig. 6.1c). Here the 
tension in the string becomes greater in the upper part of the string and we have the 
following equations of movement 

y';=K 2 {- yi +y 2 ) 

2/2 =-^ 2 (yi-32/2 + 2%) 

y'i = K 2 {2y 2 — 5y 3 + 3y 4 ) (6.3) 

y'n = r2 (( n - l )y n -1 - ( 2n - l )y n ). 

In all these examples, of course, the deviations y i are supposed to be “infinitely” 
small, so that linear models are realistic. 

Using a notation which came into use only a century later, we write these equa¬ 
tions in the form 


Vi =J2 a ij y j’ i = l,...,n, (6.4) 

3 = 1 


which is a system of 2nd order linear equations with constant coefficients. La- 
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grange solves system (6.4) by putting y i = c-e pt , which leads to 

n 

p 2 q = dtjCj, i = 1,..., n (6.5) 

3 = 1 

so that p 2 must be an eigenvalue of the matrix A = (a-) and c— (c l5 ..., c n ) T 
a corresponding eigenvector. We see here the first appearance of an eigenvalue 
problem. 

Lagrange (1762, Nr. 30) then explains that the equations (6.5) are solved by 
computing c 2 /c 1 ,..., c n /c x as functions of p from n — 1 equations and by in¬ 
serting these results into the last equation. This leads to a polynomial of degree n 
(in fact, the characteristic polynomial) to obtain n different roots for p 2 . We thus 
get 2 n solutions = cf^ exp (zb p-t) and the general solution as linear combi¬ 

nations of these. 

A complication arises when the characteristic polynomial possesses multiple 
roots. In this case, Lagrange (in his famous “Mecanique Analytique” of 1788, 
seconde partie, sect.VI, No.7) affirms the presence of “secular” terms similar to 
the formulas following (4.8). This, however, is not completely true, as became 
clear only a century later (see e.g., Weierstrass (1858), p.243: “... um bei dieser 
Gelegenheit einen Irrtum zu berichtigen, der sich in der Lagrange’schen Theorie 
der kleinen Schwingungen, sowie in alien spateren mir bekannten Darstellungen 
derselben, findet”). We therefore postpone this subject to Section 1.12. 

We solve equations (6.2) in detail, since the results obtained are of partic¬ 
ular importance (Lagrange 1759). The corresponding eigenvalue problem (6.5) 
becomes in this case p 2 c 1 = K 2 {—2c 1 + c 2 ), p 2 c i = iT 2 (c -_ 1 — 2c- + c i+1 ) for 
i = 2,..., n — 1 and p 2 c n = K 2 (c n _ 1 — 2 c n ). We introduce p 2 /K 2 + 2 = q, so 
that 

c j+i — qCj + c j_i = 0, c 0 = 0, C n+ 1 =0. ( 6 . 6 ) 

This means that the c i are the solutions of a difference equation and therefore 
Cj = Aai + B& where a and b are the roots of the corresponding characteristic 
equation z 2 — qz + 1 = 0 , hence 

a J r b = q. ab = 1 . 

The condition c 0 = 0 of ( 6 . 6 ), which means that A + B = 0, shows that c- = A(ai — 
W) with A 7 ^ 0. The second condition c n+1 = 0, or equivalently ( a/b) n+1 = 1, 
implies together with ab = 1 that 

( kiri \ f—k'Ki\ 

« = ex P ——7 ]i b = ex P (——r 

\n + 1 / \n -b 1 / 

for some k = 1,..., n. Thus we obtain 

irk 

q k = 2 cos 


n +1 




(6.7a) 
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p 2 k =2K 2 (cos = -4 K 2 (sin - — o ) 2 . (6.7b) 

V n 4-1 / V 2n 2 J 

Finally, Euler’s formula from 1740, e ix — e~ ix = 2isinx (“... si familiere au- 
jourd’hui aux Geometres”) gives for the eigenvectors (with A = — i/2 ) 


(*9 

3 


= sin 


jkir 
n +1 ’ 


j, k = 1,.., f n. 


( 6 . 8 ) 


Since the p k are purely imaginary, we also use for exp (±p k t) the “familiere” 
formula and obtain the general solution 


^ j /uTT irk 

y j (t) = 22sm-^( a k cosr k t + b k smr k t), r k =2Ksm^-^. (6.9) 

k=l 

Lagrange then observed after some lengthy calculations, which are today seen by 
using the orthogonality relations 


ill r . ikir 

-- sin-- 

n + 1 n + 1 

that 



0 j 

^ j = k 


j,k = l,...,n 


a k 


——— sin 

n + 1 ^ n + 1 
j=i 


y/o), 




1 2 

n + 1 


n 

J = 1 


kjir 
n + 1 


4 ( 0 ) 


are determined by the initial positions and velocities of the mass points. He also 
studied the case where n, the number of mass points, tends to infinity (so that, in 
the formula for r k , sinx can be replaced by x) and stood, 50 years before Fourier, 
at the portal of Fourier series theory. “Mit welcher Gewandtheit, mit welchem 
Aufwande analytischer Kunstgriffe er auch den ersten Theil dieser Untersuchung 
durchfiihrte, so liess der Uebergang vom Endlichen zum Unendlichen doch viel zu 
wiinschen ubrig...” (Riemann 1854). 


Fourier 


J’ajouterai que le livre de Fourier a une importance capitale dans 
l’histoire des mathematiques. (H. Poincare 1893) 

The first first order systems were motivated by the problem of heat conduction (Biot 
1804, Fourier 1807). Fourier imagined a rod to be a sequence of molecules, whose 
temperatures we denote by y i , and deduced from a law of Newton that the energy 
which a particle passes to its neighbours is proportional to the difference of their 
temperatures, i.e., y i _ 1 — y i to the left and y i+1 — y i to the right (“Lorsque deux 
molecules d’un meme solide sont extremement voisines et ont des temperatures 
inegales, la molecule plus echauffee communique a celle qui Test moins une quan¬ 
tity de chaleur exactement exprimee par le produit forme de la duree de V instant, 
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de la difference extremement petite des temperatures, et d’une certaine fonction de 
la distance des molecules”). This long sentence means, in formulas, that the total 
gain of energy of the ith molecule is expressed by 

v'i = K 2 (y i _ l -2y i + y i+l ), 

(6.10) 

or, in general by 


n 

V , i = '52 a ijVi> i = l,...,n, 

j = 1 

(6.11) 

a first order system with constant coefficients. 

By putting y i = c-e pt , we now obtain the eigenvalue problem 


n 

P c i = J2 a ij c i’ i = ly>..,n. 

(6.12) 


o= 1 

If we suppose the rod cooled to zero at both ends (y 0 = y n+1 =0), we can use 
Lagrange eigenvectors from above and obtain the solution 


//,-(0 = 


£‘ 

k =1 


jklT 

, sin-- exp {—w h t), 

ra + l v y 


= 4K 


, ( . irk \ 

' sm --- 

V 2n T" 2 / 


(6.13) 


By taking n larger and larger, Fourier arrived from (6.10) (again the inverse 
“method of lines”) at his famous heat equation 


du 2 d 2u 
dt a dx 2 

which was the origin of Fourier series theory. 


(6.14) 


Lagrangian Mechanics 

Dies ist der ktihne Weg, den Lagrange ..., freilich ohne ihn 
gehorig zu rechtfertigen, eingeschlagen hat. 

(Jacobi 1842/43, Vorl. Dynamik, p. 13) 

This combines d’Alembert’s dynamics, the “principle of least action” of Leibniz- 
Maupertuis and the variational calculus; published in the monumental treatise 
“Mecanique Analytique” (1788). It furnishes an excellent means for obtaining 
the differential equations of motion for complicated mechanical systems (arbitrary 
coordinate systems, constraints, etc.). 

If we define (with Poisson 1809) the “Lagrange function” 

C = T — U (6.15) 

where 

T = m 


2 


(kinetic energy) 


(6.16) 
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and U is the “potential energy” satisfying 


dU 

dx 


= -X, 


dU 

dy 


= -Y. 


dU 

dz 


= -Z 


(6.17) 


then the equations of motion (6.1) are identical to Euler’s equations (2.11) for the 
variational problem 


/ 


C dt = min 


(6.18) 


(this, mainly through a misunderstanding of Jacobi, is often called “Hamilton’s 
principle”). The important idea is now to forget (6.16) and (6.17) and to apply 
(6.15) and (6.18) to arbitrary mass points and arbitrary coordinate systems. 


Example. The spherical pendulum (Lagrange 1788, Seconde partie, Section VIII, 
Chap. II, §1). Let £ = 1 and 

x = sin 6 cos p 
y — sin 0 sin p 
z = — cos 0. 


We set m = g = 1 and have 

T=l (* 2 + y 2 + i 2 ) = l(0 2 + S in 2 0V) 

U = z = — cos 6 


(6.19) 


so that (2.11) becomes 

C e — (Cq) = — sin 6 + sin 6 cos 0 • p 2 — 6 = 0 
jC^ — ^ (C^) = — sin 2 6 ■ Cp — 2 sin 6 cos 6 • p • 9 = 0. 


( 6 . 20 ) 


We have thus obtained, by simple calculus, the equations of motion for the problem. 
These equations cannot be solved analytically. A solution, computed numerically 
by a Runge-Kutta method (see Chapter II) is shown in Lig. 6.2. 


In general, suppose that the mechanical system in question is described by n 
coordinates q x , q 2 ,..., q n and that C = T — U depends on q x , q 2 ,..., q n , 
^ • • • iQn • Then the equations of motion are 


^Qiqk^k £q, * 1 , . . . . 


n. 


( 6 . 21 ) 


fc=i 


fc=i 


These equations allow several generalizations to time-dependent systems and non- 
conventional forces. 
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Fig. 6.2. Solution of the spherical pendulum, a) 0 < x < 20, b) 0 < x < 100 
(<^o = 0, (/?o=0.17, 6 0 = 1, 00=0) 


Hamiltonian Mechanics 


Nach dem Erscheinen der ersten Ausgabe der Mecanique analy- 
tique wurde der wichtigste Fortschritt in der Umformung der Dif- 
ferentialgleichungen der Bewegung von Poisson ... gemacht ... 
im 15 ten Hefte des polytechnischen Journals ... Hier fiihrt Pois¬ 
son die Grossen p = dT/dq ... ein. 

(Jacobi 1842/43, Vorl. Dynamik, p. 67) 


Hamilton, having worked for many years with variational principles (Fermat’s prin¬ 
ciple) in his researches on optics, discovered at once that his ideas, after introduc¬ 
ing a “principal function”, allowed very elegant solutions for Kepler’s motion of 
a planet (Hamilton 1833). He then undertook in several papers (Hamilton 1834, 
1835) to revolutionize mechanics. After many pages of computation he thereby dis¬ 
covered that it was “more convenient in many respects” (Hamilton 1834, Math. Pa¬ 
pers II, p. 161) to work with the momentum coordinates (idea of Poisson) 



( 6 . 22 ) 


instead of q { , and with the function 


H = (6-23) 

k=1 

considered as function of q 1 ,..., q n , p 1 ,..., p n . This idea, to let derivatives 
dC / dq { and independent variables p i interchange their parts in order to simplify 
differential equations, is due to Legendre (1787). Differentiating (6.23) by the 
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chain rule, we obtain 


dH A dq k .A dC 8q k 

d Pi £ri d Pi dq k d Pi 


and 


dl^_^dq k dC y, dC dgj, 

~ £r{ d <ii Pk dc ti d <lk dQi 


dQi tl dq i 9q ' 

By (6.22) and (6.21) both formulas simplify to 
dH dH 


Qi = 


d Pi 


Pi = 


dQi 


i = 1 , 


(6.24) 


These equations are marvellously symmetric “... and to integrate these differential 
equations of motion... is the chief and perhaps ultimately the only problem of 
mathematical dynamics” (Hamilton 1835). Jacobi (1843) called them canonical 
differential equations. 


Remark. If the kinetic energy T is a quadratic function of the velocities q { , Euler’s 
identity (Euler 1755, Caput VII, § 224, “... si V fuerit functio homogenea...”) 
states that 

n dT 

2r = g4^. (6.25, 

If we further assume that the potential energy U is independent of q •, we obtain 


71 ^ v— OT 

h = J2 q kPk - Cm^2,q k — - C = 2T - C'mT + U. 

k= i k= i aqk 

This is the total energy of the system. 


(6.26) 


Example. The spherical pendulum again. From (6.19) we have 



U± • 2 f\ 

—— = sin 0 • (p 
op) 


(6.27) 


and, by eliminating the undesired variables 6 and p , 

H = T + Um-U + ^_)-cosO. 
2 V sm QJ 

Therefore (6.26) becomes 


o C0S $ • /i 

P9=P V - 

^ sin 0 


P lfi = 0 


9 = 


Pe 


<P = 


P« 


sin 2 9 


(6.28) 


(6.29) 


These equations appear to be a little simpler than Lagrange’s formulas (6.20). For 
example, we immediately see that p = Const (Kepler’s second law). 
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Exercises 


1. Verify that, if u(x) is sufficiently differentiable, 

n{ I -S)-2n^)+uj I + S) = u „ (j) + g (x) + ^4, 

Hint. Use Taylor series expansions for u{pc +5) and u(x — S) . This relation 
establishes the connection between (6.10) and (6.14) as well as between (6.2) 
and the wave equation. 

2. Solve equation (6.3) for n = 2 and n = 3 by using the device of Lagrange 
described above (1762) and discover naturally the characteristic polynomial of 
the matrix. 


3. Solve the first order system (6.11) with initial values y i (0) = (—1)% where the 
matrix A is the same as in Exercise 2, and draw the solutions. Physically, this 
equation would represent a string with weights hanging, say, in honey. 


4. Find the first terms of the development at the singular point x = 0 of the solu¬ 
tions of the following system of nonlinear equations 


x 2 y" + 2 xy' = 2 yz 2 + A x 2 y(y 2 - 1), 
x 2 z" = z(z 2 — 1) + x 2 y 2 z, 


2 /( 0 ) = 0 
z( 0) = 1 


(6.30) 


where A is a constant parameter. Equations (6.30) are the Euler equations for 
the variational problem 


1 = 




( z 2 -!) 2 

2x 2 


i 2 2 i A 2/ 2 

+ y z +-x (y 



dx , 


y( oo) = 1, 2 ( 00 ) = 0 


which gives the mass of a “monopole” in nuclear physics (see ’t Hooft 1974). 


5. Prove that the Hamiltonian function H(q 1 ,..., q n ,p 1 ,... ,p n ) is a first inte¬ 
gral for the system (6.24), i.e., every solution satisfies 

H(q 1 (t), ...,q n (t),p 1 (t),...,p n (t)) = Const. 





1.7 A General Existence Theorem 


M. Cauchy annonce, que, pour se conformer au voeu du Conseil, 
il ne s’attachera plus a donner, comme il a fait jusqu’a present, des 
demonstrations parfaitement rigoureuses. 

(Conseil destruction de l’Ecole polytechnique, 24 nov. 1825) 

You have all professional deformation of your minds; convergence 
does not matter here ... (P. Henrici 1985) 


We now enter a new era for our subject, more theoretical than the preceding one. It 
was inaugurated by the work of Cauchy, who was not as fascinated by long numer¬ 
ical calculations as was, say, Euler, but merely a fanatic for perfect mathematical 
rigor and exactness. He criticized in the work of his predecessors the use of infinite 
series and other infinite processes without taking much account of error estimates 
or convergence results. He therefore established around 1820 a convergence the¬ 
orem for the polygon method of Euler and, some 15 years later, for the power 
series method of Newton (see Section 1.8). Beyond the estimation of errors, these 
results also allow the statement of general existence theorems for the solutions of 
arbitrary differential equations (“d’une equation differentielle quelconque”), whose 
solutions were only known before in a very few cases. A second important conse¬ 
quence is to provide results about the uniqueness of the solution, which allow one 
to conclude that the computed solution (numerically or analytically) is the only one 
with the same initial value and that there are no others. Only then we are allowed 
to speak of the solution of the problem. 

His very first proof has recently been discovered on fragmentary notes (Cauchy 
1824), which were never published in Cauchy’s lifetime (did his notes not satisfy 
the Minister of education?: .. mais que le second professeur, M. Cauchy, n’a 
presente que des feuilles qui n’ont pu satisfaire la commission, et qu’il a ete jusqu’a 
present impossible de l’amener a se rendre au voeu du Conseil et a executer la 
decision du Ministre”). 

Convergence of Euler’s Method 

Let us now, with bared head and trembling knees, follow the ideas of this historical 
proof. We formulate it in a way which generalizes directly to higher dimensional 
systems. 

Starting with the one-dimensional differential equation 

y' = f(x,y), y(x 0 )my 0 , y(X)=l (7.1) 

we make use of the method explained by Euler (1768) in the last section of his “In- 
stitutiones Calculi Integralis I” (Caput VII, p. 424), i.e., we consider a subdivision 
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of the interval of integration 

x 0 ,x 1 ,...,x n _ 1 ,x n =X (7.2) 

and replace in each subinterval the solution by the first term of its Taylor series 

Vl-Vo = ( x i ~ x o)f( x oi Vo) 

y2-yi = ( x 2~ x i)f( x i’yi) (73) 

Vn-Vn-l =( X n- X n-l)f( X n-l^n-l)- 

For the subdivision above we also use the notation 

h=(h 0 ,h 1 ,...,h n _ 1 ) 

where /i- = x i+1 —x i . If we connect y 0 and y 1 , y 1 and y 2 , ... etc by straight lines 
we obtain the Euler polygon 

y h ( x ) = yi + ( x ~ x i)f( x i’yi) for x i<x<x i+1 . (7.3a) 


Lemma 7.1. Assume that \ f\ is bounded by A on 

D = {(x,y) \x 0 <x<X, \y-y 0 \<&}• 

If X — x 0 < b/A then the numerical solution (x i , yf) given by (73), remains in D 
for every subdivision (7.2) and we have 


\yh( x )-yo\< A -\ x ~ x ol 


Vh( x ) ~ (v 0 + ( x ~ x o)f( x o » VoY) 
if\f( x ,y)-f( x 0 ,y 0 )\ <£ on D. 


< £ • \x — Xn 


(7.4) 

(7.5) 


Proof Both inequalities are obtained by adding up the lines of (7.3) and using the 
triangle inequality. Formula (7.4) then shows immediately that for A(pc — x 0 ) < b 
the polygon remains in D. □ 


Our next problem is to obtain an estimate for the change of y h (x) , when the 
initial value y 0 is changed: let z 0 be another initial value and compute 

z 1 -z 0 = (xi~x 0 )f(x 0 ,z 0 ). (7.6) 

We need an estimate for \z 1 —y l \. Subtracting (7.6) from the first line of (7.3) we 
obtain 

*i - Vi = z o - Vo + 0 *1 - x o) (f( x o> z o) - f( x 0 . Vo)) ■ 

This shows that we need an estimate for f(x 0 , z 0 ) — f(x 0 , y Q ) . If we suppose 

\f(x,z) -f(x,y)\< L\z-y\ (7.7) 
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we obtain 


\ z i~vi\< { l + ( x i~ x o) L )\ z o-yo\- 


(7.8) 


Lemma 7.2. For a fixed subdivision h let y h (x) and z h (x) be the Euler polygons 
corresponding to the initial values y 0 and z 0 , respectively. If 

I df, 


dy 


(• x ,y) 


< L 


(7.9) 


in a convex region which contains ( x, y h (x )) and (x, z h (x)) for all x 0 < x < X, 
then 


\z h ( x ) ~yh( x )\< e 


L(x-x o)| _ 


2/o I- 


(7.10) 


Proof (7.9) implies (7.7), (7.7) implies (7.8), (7.8) implies 

If we repeat the same argument for z 2 — y 2 , z 3 — y 3 , and so on, we finally obtain 
(7.10). □ 


Remark. Condition (7.7) is called a “Lipschitz condition”. It was Lipschitz (1876) 
who rediscovered the theory (footnote in the paper of Lipschitz: “L’auteur ne 
connait pas evidemment les travaux de Cauchy ... ”) and advocated the use of (7.7) 
instead of the more stringent hypothesis (7.9). Lipschitz’s proof is also explained 
in the classical work of Picard (1891-96), Vol. II, Chap. XI, Sec. I. 


If the subdivision (7.2) is refined more and more, so that 

\h\ := max h i —> 0, 

i=0,...,n—1 

we expect that the Euler polygons converge to a solution of (7.1). Indeed, we have 

Theorem 7.3. Let f(x,y) be continuous, and \ f\ be bounded by A and satisfy the 
Lipschitz condition (7.7) on 

D={(x,y) \ x 0 <x < X, \y-y 0 \ <&). 

If X — x 0 < b/A, then we have: 

a) For \h\ —> 0 the Euler polygons y h (x) converge uniformly to a continuous 
function <p(x). 

b) <p(x) is continuously differentiable and solution of (7.1) on x 0 < x < X. 

c) There exists no other solution of (7.1) on x 0 < x < X. 


Proof, a) Take an e > 0. Since / is uniformly continuous on the compact set D , 
there exists a 5 > 0 such that 


K ~u 2 \ < <5 


v 1 — v 2 1 < A ■ 5 


and 



38 


I. Classical Mathematical Theory 


imply 

\f( u i^v 1 )-f(u 2 ,v 2 )\ <£. (7.11) 

Suppose now that the subdivision (7.2) satisfies 

\ x i+1 - x i \<5, i.e., \h\<5. (7.12) 

We first study the effect of adding new mesh-points. In a first step, we consider a 
subdivision h( 1), which is obtained by adding new points only to the first subin¬ 
terval (see Fig. 7.1). It follows from (7.5) (applied to this first subinterval) that for 
the new refined solution y h ^we have the estimate \Vh(i)( x i) ~Uh( x i)l — 
e\x 1 —x 0 \. Since the subdivisions h and h( 1) are identical on x x < x < X we 
can apply Lemma 7.2 to obtain 

\yh(i){ x )-yh{ x )\< eL(x ~ Xl \ x i-x 0 )e for x x <x<X. 

We next add further points to the subinterval (x 1 ,x 2 ) and denote the new subdi¬ 
vision by h( 2). In the same way as above this leads to \y h ( 2 ) ( x 2 ) — Vh{ i)(^ 2 ) I — 
e\x 2 — x x \ and 

\yh( 2 )( x )-yh(i)( x )\^ eL(x ~ X2) ( x 2 ~ x i) £ for x 2 < x <x. 

The entire situation is sketched in Fig. 7.1. If we denote by h the final refinement, 
we obtain for x • < x < x i+1 

\^{x)-y h (x)\ (7.13) 

< e[e L( ' x ~ Xl \x 1 — x 0 ) +.. . + e L( ' x ~ Xi \x i - +e(x-x i ) 

< e [ e L{x ~ s) ds=y f e L ( x ~ x °) - l). 

Jx 0 L v J 

If we now have two different subdivisions h and h, which both satisfy (7.12), we 
introduce a third subdivision h which is a refinement of both subdivisions (just as 
is usually done in proving the existence of Riemann’s integral), and apply (7.13) 
twice. We then obtain from (7.13) by the triangle inequality 

\Vh( x ) - 2^0)I < 2 J (e i(x_Xo) - l). 

For e > 0 small enough, this becomes arbitrarily small and shows the uniform 
convergence of the Euler polygons to a continuous function (p(x). 
b) Let 

s(S):=sup^\f(u 1 ,v 1 )~ f(u 2 ,v 2 )\-, \u x -u 2 \<5, \v x -v 2 \<A5, ( w i5 e £>} 

be the modulus of continuity. If x belongs to the subdivision h then we obtain 
from (7.5) (replace (x 0 , y 0 ) by (x, y h {x)) and x by x + 5) 

I Vh( x + 5 )- Vh( x ) - S f( x > Vh( x )) I < £ ( 6 ) 6 - 


(7.14) 
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Taking the limit \h\ ^ 0 we get 

\ip(x + 8)-<ip(x)-5f(x,ip(x))\ <e(6)8. (7.15) 

Since e(5) —> 0 for S —> 0, this proves the differentiability of ip(x) and ip'(x) = 

c) Let 'f(x) be a second solution of (7.1) and suppose that the subdivision h 
satisfies (7.12). We then denote by y^\x) the Euler polygon to the initial value 
(x-, (it is defined for x i < x < X ). It follows from 

i/j(x) = ip(x i )+ f (s, ip(s)) ds 

J Xi 

and (7.11) that 

\tp(x)—yfr\x) \<e\x — x i \ for x i <x<x i+1 . 

Using Lemma 7.2 we deduce in the same way as in part a) that 

mx)-y h (x)\<^{e L ^-^-l). (7.16) 

Taking the limits \h\ —> 0 and e —> 0 we obtain \^{x) — ip(x) \ < 0, proving unique¬ 
ness. □ 


Theorem 7.3 is a local existence - and uniqueness - result. However, if we 
interpret the endpoint of the solution as a new initial value, we can apply Theorem 
7.3 again and continue the solution. Repeating this procedure we obtain 

Theorem 7.4. Assume U to be an open set in M 2 and let f and df /dy be con¬ 
tinuous on U . Then, for every (x 0 , y 0 ) G U, there exists a unique solution of (7.1), 
which can be continued up to the boundary of U (in both directions). 
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Proof. Clearly, Theorem 7.3 can be rewritten to give a local existence - and unique¬ 
ness - result for an interval (X, x 0 ) to the left of x 0 . The rest follows from the 
fact that every point in U has a neighbourhood which satisfies the assumptions of 
Theorem 7.3. □ 


It is interesting to mention that formula (7.13) for |/i| —> 0 gives the following 
error estimate 

\y{x)-y h {x)\<^(e L ^~^-l) (7.17) 

for the Euler polygon (\h\ < 5). Here y{pc) stands for the exact solution of (7.1). 
The next theorem refines the above estimates for the case that f(x,y) is also dif¬ 
ferentiable with respect to x. 


Theorem 7.5. Suppose that in a neighbourhood of the solution 


< A , 


df 

< L , 

df 

dy 

— *) 

dx 


< M. 


We then have the following error estimate for the Euler polygons: 

\y(x)-y h (x)\ < >>- l) • \h\, 

provided that \h\ is sufficiently small. 


(7.18) 


Proof. For \u 1 — u 2 \ < \h\ and \v 1 —v 2 \ < A\h\ we obtain, due to the differentia¬ 
bility of /, the estimate 

\f(u 1 ,v 1 )~ f(u 2 ,v 2 )\ < (M + AL)\h\ 

instead of (7.11). When we insert this amount for e into (7.16), we obtain the 
stated result. □ 


The estimate (7.18) shows that the global error of Euler’s method is propor¬ 
tional to the maximal step size \h\ . Thus, for an accuracy of, say, three decimal 
digits, we would need about a thousand steps; a precision of six digits will normally 
require a million steps etc. We see thus that the present method is not recommended 
for computations of high precision. In fact, the main subject of Chapter II will be 
to find methods which converge faster. 
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Existence Theorem of Peano 


Si a est un complexe d’ordre n, et b un nombre reel, alors on peut 
determiner b' et /, ou b' est une quantite plus grande que b , et 
/ est un signe de fonction qui a chaque nombre de Tintervalle de 
b a b' fait correspondre un complexe (en d’autres mots, ft est un 
complexe fonction de la variable reelle t, definie pour toutes les 
valeurs de Tintervalle ( b , b')); la valeur de ft pour t = b est a; et 
dans tout Tintervalle (b,b') cette fonction ft satisfait a Tequation 
differentielle donnee. (Original version of Peano’s Theorem) 

The Lipschitz condition (7.7) is a crucial tool in the proof of (7.10) and finally 
of the Convergence Theorem. If we completely abandon condition (7.7) and only 
require that f(x,y) be continuous, the convergence of the Euler polygons is no 
longer guaranteed. 

An example, plotted in Fig. 7.2, is given by the equation 

y’ = 4(sign(j/)v1yI + max(o,a;- ^) ' C0S (“J“^)) ( 7 - 19 ) 

with j/(0) = 0. It has been constructed such that 

f(h, 0) = 4(—l)*ft for h = 2 -i , 

f(x, y)=4 sign (y) ■ y/\y\ for \y\ > x 2 . 



Fig. 7.2. Solution curves and Euler polygons for equation (7.19) 
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There is an infinity of solutions for this initial value, some of which are plotted 
in Fig. 7.2. The Euler polygons converge for h = 2~ i and even i to the maximal 
solution y = Ax 2 , and for odd i to y = —4x 2 . For other sequences of h all inter¬ 
mediate solutions can be obtained as well. 


Theorem 7.6 (Peano 1890). Let f(x, y) be continuous and \ f\ be bounded by A 
on 

D = j(x,y) | x 0 < x < X, \y — y Q \< 6). 

IfX — x 0 <b/A, then there is a subsequence of the sequence of the Euler polygons 
which converges to a solution of the differential equation. 


The original proof of Peano is, in its crucial part on the convergence result, very 
brief and not clear to unexperienced readers such as us. Arzela (1895), who took 
up the subject again, explains his ideas in more detail and emphasizes the need for 
an equicontinuity of the sequence. The proof usually given nowadays (for what has 
become the theorem of Arzela-Ascoli), was only introduced later (see e.g. Perron 
(1918), Hahn (1921), p. 303) and is sketched as follows: 


Proof Let 

v 1 (x),v 2 (x),v 3 (x),,.. (7.20) 

be a sequence of Euler polygons for decreasing step sizes. It follows from (7.4) 
that for fixed x this sequence is bounded. We choose a sequence of numbers 
r 2 , r 3 ,... dense in the interval (x 0 , X ). There is now a subsequence of (7.20) 
which converges for x = r 1 (Bolzano-Weierstrass), say 

• • • (7.21) 

We next select a subsequence of (7.21) which converges for x = r 2 

4 2) ( x ), v 2 2) (■ x ), 4 2) ( x ),--- c 7 - 22 ) 

and so on. Then take the “diagonal” sequence 

'>’ ( 1 1) ( X ),^2 2) ( X ) ! V 3 3) ( X )> ■ ■ ■ ( 7 - 23 ) 

which, apart from a finite number of terms, is a subsequence of each of these se¬ 
quences, and thus converges for all r i . Finally, with the estimate 

\v^ ) (x)-v^ ) {r j )\<A\ x ~r j \ 

(see (7.4)), which expresses the equicontinuity of the sequence, we obtain 

\v^(x)-v^\x)\ 

< \v^{x) - - V^\ rj )\ + \vW( rj ) - v£\x)\ 

<2A\x-r j \ + \v^(r j )-v^(r j )\. 
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For fixed e > 0 we then choose a finite subset R of {r 1? r 2 ,...} satisfying 

min{|x — r -1 ; r-^R^ x 0 < x < X} < e/A 

and secondly we choose iV such that 

\v^ (vj) — v(rj) | <5 for n,m> N and E i2. 

This shows the uniform convergence of (7.23). In the same way as in part b) of the 
proof of Theorem 7.3 it follows that the limit function is a solution of (7.1). One 
only has to add an 0(\h\) -term in (7.14), if x is not a subdivision point. □ 


Exercises 

1. Apply Euler’s method with constant step size x i+1 — x i = 1/n to the differ¬ 
ential equation y f = ky, y( 0) = 1 and obtain a classical approximation for the 
solution y(l) = e k . Give an estimate of the error. 

2. Apply Euler’s method with constant step size to 

a) y' = y 2 , 2 /( 0 ) = 1, 2 /(l/ 2 ) =? 

b) y' = X 2 + y 2 f j/(0) = 0, 2/(l/2) =? 

Make rigorous error estimates using Theorem 7.4 and compare these estimates 
with the actual errors. The main difficulty is to find a suitable region in which 
the estimates of Theorem 7.4 hold, without making the constants A, L, M 
too large and, at the same time, ensuring that the solution curves remain inside 
this region (see also 1.8, Exercise 3). 

3. Prove the result: if the differential equation y' = f(x,y ), y(x Q ) = y Q with / 
continuous, possesses a unique solution, then the Euler polygons converge to 
this solution. 

4. “There is an elementary proof of Peano’s existence theorem” (Walter 1971). 
Suppose that A is a bound for | /1. Then the sequence 

2/i+i =Vi + h • max{/ (x, y)\x i <x< x i+1 , y t - 3 Ah < y < y i + Ah} 

converges for all continuous / to a (the maximal) solution. Try to prove this. 
Unfortunately, this proof does not extend to systems of equations, unless they 
are “quasimonotone” (see Section 1.10, Exercise 3). 
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A second approach to existence theory is possible with the help of an iterative re¬ 
finement of approximate solutions. The first appearances of the idea are very old. 
For instance many examples of this type can be found in the work of Lagrange, 
above all in his astronomical calculations. Let us consider here the following illus¬ 
trative example of a Riccati equation 

y' = x 2 + y + 0.1y 2 , 2 /( 0 ) = 0. (8.1) 

Because of the quadratic term, there is no elementary solution. A very natural idea 
is therefore to neglect this term, which is in fact very small at the beginning, and to 
solve for the moment 

y[=x 2 + y 1 , 2/i(0) = 0. (8.2) 

This gives, with formula (3.3), a first approximation 

Vl (x) = 2e x - ( X 2 + 2x + 2). (8.3) 

With the help of this solution, we now know more about the initially neglected term 
0.1 y 2 ; it will be close to 0.1 y\ . So the idea lies at hand to reintroduce this solution 
into (8.1) and solve now the differential equation 

y , 2 =x 2 + y 2 + 0.1-(y 1 (x)) 2 , j/ 2 (0) =0. (8.4) 

We can use formula (3.3) again and obtain after some calculations 

y 2 ( x ) = Vi( x ) + \^ x - ~^e x (x 3 + 3x 2 + 6x - 54) 
o lo 

— — (x 4 -\- 8x 3 T 32x 2 -\- 72x -\- 76). 

This is already much closer to the correct solution, as can be seen from the follow¬ 
ing comparison of the errors e 1 = y(x) — y^x) and e 2 = y{pc) — y 2 (x) • 


x = 0.2 

e 1 = 0.228 x 10" 07 

e 2 = 0.233 x 10" 12 

x = 0.4 

e 1 = 0.327 x 10" 05 

e 2 = 0.566 x 10" 09 

x = 0.8 

e 1 = 0.534 x 10“ 03 

e 2 = 0.165 x 10“° 5 
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It looks promising to continue this process, but the computations soon become very 
tedious. 


Picard-Lindelof Iteration 

The general formulation of the method is the following: we try, if possible, to split 
up the function f(x,y) of the differential equation 

y' = f(x,y) = fi(x,y) + f 2 (x,y), y(x 0 ) = y 0 (8.5) 

so that any differential equation of the form y' = fi(x, y) + g{pc) can be solved 
analytically and so that f 2 (x, y) is small. Then we start with a first approximation 
y Q (x) and compute successively y^x), y 2 (x), • • • by solving 

y'i+i = fi{x,y i+1 ) +f 2 (x, yi (x)), y i+1 (x 0 ) = y 0 . (8.6) 

The most primitive form of this process is obtained by choosing f 1 = 0, f 2 = f , 
in which case (8.6) is immediately integrated and becomes 

V i+ i( x ) =y 0 + f f(s, yi (s))ds. (8.7) 

J Xo 

This is called the Picard-Lindelof iteration method. It appeared several times in 
the literature, e.g., in Liouville (1838), Cauchy, Peano (1888), Lindelof (1894), 
Bendixson (1893). Picard (1890) considered it merely as a by-product of a simi¬ 
lar idea for partial differential equations and analyzed it thoroughly in his famous 
treatise Picard (1891-96), Vol. II, Chap. XI, Sect. III. 

The fast convergence of the method, for \x — x 0 \ small, is readily seen: if we 
subtract formula (8.7) from the same with i replaced by i — 1, we have 

.7, . = / (f(s,y i (s))-f(s,y i _ 1 (s))'\ds. (8.8) 

Jxo V 7 

We now apply the Lipschitz condition (7.7) and the triangle inequality to obtain 

\y i+l { x )-Vi( x )\<L f |y i (s)-y i _ 1 (s)|ds. (8.9) 

Jx 0 

When we assume y Q (x) = y Q , the triangle inequality applied to (8.7) with i = 0 
yields the estimate 

\vi( x )-yo ( x )\^ A \ x ~ x o\ 

where A is a bound for |/| as in Section 1.7. We next insert this into the right hand 
side of (8.9) repeatedly to obtain finally the estimate (Lindelof 1894) 


( 8 . 10 ) 
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The right-hand side is a term of the Taylor series for e L \ x ~ x °\, which converges 
for all x\ we therefore conclude that | y i+k — y i | becomes arbitrarily small when 
i is large. The error is bounded by the remainder of the above exponential series. 
So the sequence y { (x) converges uniformly to the solution y(x ). For example, if 
L\x — x 0 \ < 1/10 and the constant A is moderate, 10 iterations would provide a 
numerical solution with about 17 correct digits. 

The main practical drawback of the method is the need for repeated computa¬ 
tion of integrals, which is usually not very convenient, if at all analytically possible, 
and soon becomes very tedious. However, its fast convergence and new machine 
architectures (parallelism) coupled with numerical evaluations of the integrals have 
made the approach interesting for large problems (see Nevanlinna 1989). 

Taylor Series 


Apres avoir montre l’insuffisance des methodes d’integration fon- 
dees sur le developpement en series, il me reste a dire en peu de 
mots ce qu’on peut leur substituer. (Cauchy) 

A third existence proof can be based on a study of the convergence of the Taylor 
series of the solutions. This was mentioned in a footnote of Liouville (1836, p. 
255), and brought to perfection by Cauchy (1839-42). 

We have already seen the recursive computation of the Taylor coefficients in 
the work of Newton (see Section 1.2). Euler (1768) then formulated the general 
procedure for the higher derivatives of the solution of 

y' = f(x,y), y(x 0 )=y 0 (8.11) 

which, by successive differentiation, are obtained as 

y" = f x + f y y' = fx + fyf 

y'" — fxx + 2f xy f + fyyf 2 + fyifx + fyf) 

etc. Then the solution is 

h 2 

y(x 0 + h)=y(x 0 ) + y'(x 0 )h + y"(x 0 )— + .... (8.13) 

The formulas (8.12) for higher derivatives soon become very complicated. Euler 
therefore proposed to use only a few terms of this series with h sufficiently small 
and to repeat the computations from the point x 1 = x 0 + h (“analytic continua¬ 
tion”). 

We shall now outline the main ideas of Cauchy’s convergence proof for the 
series (8.13). We suppose that f(x,y) is analytic in the neighbourhood of the 
initial value x 0 , % > which for simplicity of notation we assume located at the origin 
x o = Vo = 0 : 

f(x,y) = a ij xt y 3 i 

i,j> 0 


(8.14) 
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where the a- are multiples of the partial derivatives occurring in (8.12). If the se¬ 
ries (8.14) is assumed to converge for \x\ < r , \y\ < r , then the Cauchy inequalities 
from classical complex analysis give 

M 

1%-l-^+T’ where M =. \f(x,y)\. (8.15) 

J j'tn-J \x\<r,\y\<r 

The idea is now the following: since all signs in (8.12) are positive, we obtain the 
worst possible result if we replace in (8.14) all a- by the largest possible values 
(8.15) (“method of majorants”): 

f(x,y) ^ y M ^ = - - . 

r i+j (1 — x/r)(l — y/r) 

However, the majorizing differential equation 

y' = Tj —rvi— y( °) = 0 

(1 — x/r)(l — y/r) 

is readily integrated by separation of variables (see Section 1.3) and has the solution 

y = r(l-^l + 2M]og(l-^ V (8.16) 

This solution has a power series expansion which converges for all x such that 
|2Mlog(l — x/r) | < 1. Therefore, the series (8.13) also converges at least for all 
\h\ < r (l — exp(—1/2M)). □ 

Recursive Computation of Taylor Coefficients 


... dieses Verfahren praktisch nicht in Frage kommen kann. 

(Runge & Konig 1924) 

The exact opposite is true, if we use the right approach ... 

(R.E. Moore 1979) 

The “right approach” is, in fact, an extension of Newton’s approach and has been 
rediscovered several times (e.g,. Steffensen 1956) and implemented into computer 
programs by Gibbons (1960) and Moore (1966). For a more extensive bibliography 
see the references in Wanner (1969), p. 10-20. 

The idea is the following: let 

= s ('(*’»«))“’l.~. <*•”> 

be the Taylor coefficients of y{x) and of f(x, y(x)) , so that (8.13) becomes 

oo 

y(x 0 + h) = yh i Y i . 

i =0 
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Then, from (8.11), 



Yi+1 i + l Fi ' 

(8.18) 

Now suppose that f(x, y) is the composition of a sequence of algebraic operations 
and elementary functions. This leads to a sequence of items, 


x,y,P,q,r, ..., and finally /. 

(8.19) 

For each of these items we find formulas for generating the i th Taylor coefficient 
from the preceding ones as follows: 

a) r 

= p±q: 

R ( x=P i ±Q i , i = 0 , 1 ,... 

( 8 . 20 a) 

b) r 

= pq : the Cauchy product yields 



R i ,• * = 0 , 1 ,... 

j= 0 

( 8 . 20 b) 

c) r 

— p/q : write p — rq , use formula b) and solve for : 



R i = 7 T~ (Pi ~ RjQi-j) ’ i = 0 , 1 , . . . 

Vo V J=0 7 

( 8 . 20 c) 

There also exist formulas for many elementary functions (in fact, 
functions are themselves solutions of rational differential equations). 

because these 

d) r 

= exp(p): use r' = p' • r and apply (8.20b). This gives for i = 

1 , 2 ,... 


1 \ 

R 0 = exp(P 0 ), R i = - 

* 3=0 

( 8 . 20 d) 

e) r 

= log(p): use p = exp(r) and rearrange formula d). This gives 



1 i ~ 1 

R 0 = log(P 0 ), ^ = — (P 4 - - £(* - J W-i) • 

0 3 = 1 

( 8 . 20 e) 

f) r ■ 

= p c , c 7 ^ 1 constant. Use pr' = crp' and apply (8.20b): 



1 1 

^ Ri = -p (E( d - (c+ 

0 3=0 

(8.20f) 
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g) r = cos (p ), s = sin (p ): as in d) we have 


R 0 = cos P 0 , 
S 0 = sinP 0 , 


i-1 


Z 3=0 


j =0 


(8.20g) 


The alternating use of (8.20) and (8.18) then allows us to compute the Taylor 
coefficients for (8.17) to any wanted order in a very economical way. It is not dif¬ 
ficult to write subroutines for the above formulas, which have to be called in the 
same order as the differential equation (8.11) is composed of elementary opera¬ 
tions. There also exist computer programs which “compile” Fortran statements for 
f(x, y ) into this list of subroutine calls. One has been written by T. Szymanski and 
J.H. Gray (see Knapp & Wanner 1969). 


Example. The differential equation y f = x 2 + y 2 leads to the recursion 

Y o = 2/(0), ^+1 = ^t(c + E * = °> 1> • • • 

3=0 

where P- = 1 for i = 2 and P h = 0 for i ^ 2 are the coefficients for x 2 . One can 
imagine how much easier this is than formulas (8.12). 


An important property of this approach is that it can be executed in interval 
analysis and thus allows us to obtain reliable error bounds by the use of Lagrange’s 
error formula for Taylor series. We refer to the books by R.E. Moore (1966) and 
(1979) for more details. 


Exercises 

1. Obtain from (8.10) the estimate 

Mx)-yo\<j(e L{x - Xo) -i) 

and explain the similarity of this result with (7.16). 

2. Apply the method of Picard to the problem y f = Ky , y{ 0) 1. 

3. Compute three Picard iterations for the problem y' = x 2 + y 2 , y( 0) = 0, 
y( 1/2) =? and make a rigorous error estimate. Compare the result with the 
correct solution y( 1/2) = 0.041791146154681863220768806849179. 
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4. Compute with an iteration method the solution of 

y' = Vx + VV’ 2 /( 0 ) = 0 


and observe that the method can work well for equations which pose serious 
problems with other methods. An even greater difference occurs for the equa¬ 
tions 

y' = ^x + y 2 , y(0) = 0 and y' = -^=+y 2 , 2 /( 0 ) = 0. 


5. Define f(x,y) by 

f(x,y) = <( 


0 

2x 
2x — 
l — 2x 


4 y 


for x < 0 

for x > 0, y < 0 

for 0 < y < x 2 
for x > 0, x 2 < y. 


a) Show that f(x, y) is continuous, but not Lipschitz. 

b) Show that for the problem y f = f(x,y), y( 0) = 0 the Picard iteration 
method does not converge. 

c) Show that there is a unique solution and that the Euler polygons converge. 


6. Use the method of Picard iteration to prove: if f(x, y) is continuous and satis¬ 
fies a Lipschitz condition (7.7) on the infinite strip D = {(#, y ); £ 0 < x < X }, 
then the initial value problem y' = f(x,y), y(x 0 ) = y 0 possesses a unique 
solution on x 0 < x < X. 

Compare this global result with Theorem 7.3. 


7. Define a function y(x) (the “inverse error function”) by the relation 

2 r - 1 2 , 

x = —= / e dt 

VWo 

and show that it satisfies the differential equation 

V* , 2 


2 /( 0 ) = 0 . 

Obtain recursion formulas for its Taylor coefficients. 


y =-e y 

y 2 
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The first treatment of an existence theory for simultaneous systems of differential 
equations was undertaken in the last existing pages (p. 123-136) of Cauchy (1824). 
We write the equations as 

yi = fi(x,yv'tv n )’ yi( x o) = yw’ 2/iW = ? 

(9.1) 

y'n = fn( X i VlT ■ ‘ iV n )i Vn( x o) = VnOi Vn( X ) = ? 

and ask for the existence of the n solutions y x (x) ,..., y n {x) . It is again natural to 
consider, in analogy to (7.3), the method of Euler 

Vk,i +1 = Vki + (*i+1 - x i ) • Vi i. • • ■» Vni) (9.2) 

(for k = 1,..., n and i = 0,1, 2,...). Here y ki is intended to approximate y k {pj) > 
where x 0 < < x 2 ... is a subdivision of the interval of integration as in (7.2). 

We now try to carry over everything we have done in Section 1.7 to the new 
situation. Although we have no problem in extending (7.4) to the estimate 

\yki~Vko \ ^ A k\ x i~ x o\ if \fk( x ^yv^y n )\^ A ki ( 9 - 3 ) 

things become a little more complicated for (7.7): we have to estimate 

fk{x, Z n ) - f k { x, y 1 ,---,y n ) = ^-(z 1 -y 1 ) + ...+ ^-(z n - y n ), 

(9.4) 

where the derivatives df k /dy i are taken at suitable intermediate points. Here 
Cauchy uses the inequality now called the “Cauchy-Schwarz inequality” (“Enfin, 
il resulte de la formule (13) de la 1 le legon du calcul differentiel ... ”) to obtain 


f k ( x , yif‘--,y n )\ 



(9.5) 

y 1 ) 2 + ... + (z n -y n ) 2 - 


At this stage, we begin to feel that further development is advisable only after the 
introduction of vector notation. 
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Vector Notation 


This was promoted in our subject by the papers of Peano, (1888) and (1890), who 
was influenced, as he says, by the famous “Ausdehnungslehre” of Grassmann and 
the work of Hamilton, Cayley, and Sylvester. We introduce the vectors (Peano 
called them “complexes”) 

y= (yi,---,y n ) T , 2/i = (2/i*>--->2/m) T , z = ( z i>--^ z J T etc > 


and hope that the reader will not confuse the components y i of a vector y with 
vectors with indices. We consider the “vector function” 

f(x,y)= (f 1 (x,y),...,f n (x,y)) T , 


so that equations (9.1) become 




II 

o 

II 

o 

A 

y(x) =?, 

(9.1’) 

Euler’s method (9.2) is 




Vi+i =Vi + ( x i+ 1 


* = 0,1,2,... 

(9.2’) 


and the Euler polygon is given by 


Vh( x ) = yi + ( x - x i)f{ x nVi) for x i< x<x i+1 . 

There is no longer any difference in notation with the one-dimensional cases (7.1), 
(7.3) and (7.3a). 

In view of estimate (9.5), we introduce for a vector y = ,..., y n ) T the norm 

(originally “modulus”) 

IMI = \fyf + ^+yi ( 9 - 6 ) 

which satisfies all the usual properties of a norm, for example the triangle inequality 


\\y + 4 < IMI + IN(i 




<Ew- 

i=1 


(9.7) 


The Euclidean norm (9.6) is not the only one possible, we also use (“on pourrait 
aussi definir par mx la plus grande des valeurs absolues des elements de x ; alors 
les proprietes des modules sont presqu’evidentes”, Peano) 

Ml =max(|y 1 |,...,|y n |), (9.6’) 

IMI = \Vi\+--- + \Vn\- (9-6”) 


We are now able to formulate estimate (9.3) as follows, in perfect analogy with 
(7.4): if for some norm \\f(x, y) || < A on D = {(x, y) \ x 0 < x < X, ||y — y 0 1| < b} 
and if X — x 0 < b/A then the numerical solution {xy { ), given by (9.2’), remains 
in D and we have 

\\yh( x )~ Vo\\ <A-\xksx 0 \. 

The analogue of estimate (7.5) can be obtained similarly. 


(9.8) 
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In order to prove the implication “(7.9) => (7.7)” for vector-valued functions it 
is convenient to work with norms of matrices. 

Subordinate Matrix Norms 

The relation (9.4) shows that the difference f(x,z) — f(x,y) can be written as the 
product of a matrix with the vector z — y. It is therefore of interest to estimate 
\\Qv\\ and to find the best possible estimate of the form \\Qv\\ < /3\\v\\. 

Definition 9.1. Let Q be a matrix (n columns, m rows) and || ... || be one of the 
norms defined in (9.6), (9.6’) or (9.6”). The subordinate matrix norm of Q is then 
defined by 

IIQII = sup "Tr-ir- = sup ||Qu||. (9.9) 

|p|| IMI=i 

By definition, ||Q|| is the smallest number such that 

\\Qv\\<\\Q\\-\\v\\ for all u (9.10) 

holds. The following theorem gives explicit formulas for the computation of (9.9). 

Theorem 9.2. The norm of a matrix Q is given by the following formulas: for the 
Euclidean norm (9.6), 

||Q|| = \Jlargest eigenvalue of Q T Q ; (9.11) 

for the max-norm (9.6’), 

n 

ll'SII = fc= 7 aX m(^l 9fci l)’ (9 ' lr) 

for the norm (9.6”), 

m 

ll«ll= l i?-(El^l)- (9.11”) 

fc=i 


Proof. Formula (9.11) can be seen from ||Qu|| 2 = v T Q T Qv with the help of an 
orthogonal transformation of Q T Q to diagonal form. 

Formula (9.11’) is obtained as follows (we denote (9.6’) by || - - * 11 oo): 


II Qi 


max 

/c —1, — ,71 


i=1 




< 


max 

^/c = l .... ,71 


Z_ s 

i— 1 


Ifel) • IMloo ( 9 - 12 ) 


shows that ||Q|| < max fc J]. \q ki \. The equality in (9.11’) is then seen by choosing 
a vector of the form u=(=bl,zbl,...,=tl) T for which equality holds in (9.12). 
The formula (9.11”) is proved along the same lines. □ 
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All these formulas remain valid for complex matrices. Q T has only to be 
replaced by Q* (transposed and complex conjugate). See e.g., Wilkinson (1965), 
p. 55-61, Bakhvalov (1976), Chap. VI, Par. 3. With these preparations it is possible 
to formulate the desired estimate. 


Theorem 9.3. If f(x,y) is differentiable with respect to y in an open convex 
region U and if 


ot , , 

a-y [x ' v) 


< L 


for (x, y)eU 


(9.13) 


then 


\\f{x,z) -f(x,y)\\ < L \\z-y\\ for (x, y), (x, z) £ U. (9.14) 
(Obviously, the matrix norm in (9.13) is subordinate to the norm used in (9.14).) 


Proof. This is the “mean value theorem” and its proof can be found in every text¬ 
book on calculus. In the case where df/dy is continuous, the following simple 
proof is possible. We consider p(t) = f(x,y + t(z — y )) and integrate its deriva¬ 
tive (componentwise) from 0 to 1 


f(x,z) -f{x,y) = <p(l) -<p(0) = [ <p'{t)dt 

Jo 

= J ^ (x,y + t(z-y)) ■ (z-y)dt. 


(9.15) 


Taking the norm of (9.15), using 



dt 



(9.16) 


and applying (9.10) and (9.13) yields the estimate (9.14). The relation (9.16) is 
proved by applying the triangle inequality (9.7) to the finite Riemann sums which 
define the two integrals. □ 


We thus have obtained the analogue of (7.7). All that remains to do is, Da 
capo al fine, to read Sections 1.7 and 1.8 again: Lemma 7.2, Theorems 7.3, 7.4, 7.5, 
and 7.6 together with their proofs and the estimates (7.10), (7.13), (7.15), (7.16), 
(7.17), and (7.18) carry over to the more general case with the only changes that 
some absolute values are to be replaced by norms. 

The Picard-Lindelof iteration also carries over to systems of equations when 
in (8.7) we interpret y i+1 (x), y 0 and /(s, y^s)) as vectors, integrated componen¬ 
twise. The convergence result with the estimate (8.10) also remains the same; for 
its proof we have to use, between (8.8) and (8.9), the inequality (9.16). 
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The Taylor series method, its convergence proof, and the recursive generation 
of the Taylor coefficients also generalize in a straightforward manner to systems of 
equations. 


Exercises 

1. Solve the system 

= 2/1(0) = 1 

2/2 = + 2 / 1 , 2 / 2 ( 0 ) = 0 

by the methods of Euler and Picard, establish rigorous error estimates for all 
three norms mentioned. Verify the results using the correct solution y x (x) = 
cosx, y 2 (x) = sinx. 

2. Consider the differential equations 

y[m-100y 1 +y 2 , y 1 (0) = l, %(1)=? 

2/2 = 2/i - 100%, y 2 { 0) = 0, %(1) = ? 

a) Compute the exact solution y(pc) by the method explained in Section 1.6. 

b) Compute the error bound for || z(x) —y{x) ||, where z(x) = 0, obtained 
from (7.10). 

c) Apply the method of Euler to this equation with h — 1/10. 

d) Apply Picard’s iteration method. 

3. Compute the Taylor series solution of the system with constant coefficients 
yi = Ay , 7/(0) = y 0 . Prove that this series converges for all x. Apply this 
series to the equation of Exercise 1. 

Result. 

00 1 

1 /(*) = 7r^ iy o = : eAx y 0 - 

i =0 



1.10 Differential Inequalities 


Differential inequalities are an elegant instrument for gaining a better understand¬ 
ing of equations (7.10), (7.17) and much new insight. This subject was inaugurated 
in the paper, once again, Peano (1890) and further developed by Perron (1915), 
Muller (1926), Kamke (1930). A classical treatise on the subject is the book of 
Walter (1970). 


Introduction 


The basic idea is the following: let v{x) denote the Euler polygon defined in (7.3) 
or (9.2), so that 

v\x) = f{x i ,y i ) for Xi<x<x i+1 . (10.1) 

For any chosen norm, we investigate the error 

m(x) = \\v(x) — y(x)\\ (10.2) 


as a function of x and we naturally try to estimate its growth. 

Unfortunately, m(x) is not necessarily differentiable, due firstly to the cor¬ 
ners of the Euler polygons and secondly, to corners originating from the norms, 
especially the norms (9.6’) and (9.6”). Therefore we consider the so-called Dini 
derivatives defined by 


D + m(x) = limsup 

h —*0,/i>0 


m{x + h) — m{x ) 
h 


D,m(x) = liminf 
^ h^0,h>0 


m(x + h) — m(x) 
h 


(see e.g., Scheeffer (1884), Hobson (1921), Chap. V, §260, §280). The property 


w(x + h)\\ — ||u;(x)|| < || w(x + h) — w(x) || (10.3) 


is a simple consequence of the triangle inequality (9.7). If we divide (10.3) by 
h > 0, we obtain the estimates 

D + \\w(x)\\ < \\w\x + 0)||, D+\\w(x)\\ < \\w'(x + 0)||, 


(10.4) 
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where w'(x + 0) is the right derivative of the vector function w{x) . If we apply 
this to m(x) of (10.2), we obtain 

D + m(x) < llu^x + O) —y\x)\\ 

= \\v'(x + 0) - f(x, v(x)) + f(x, v(x)) - f(x, y(a:))(f 

and, using the triangle inequality and the Lipschitz condition (9.14), 

D + m(x) < S(x) + L • m(x). (10.5) 

Here, we have introduced 

S(x) = \\v\x + 0) — f(x,v(x)) || (10.6) 

which is called the defect of the approximate solution v(x). This fundamental 
quantity measures the extent to which the function v(x) does not satisfy the im¬ 
posed differential equation. (7.11) together with (10.1) tell us that S(x) < e, so that 
(10.5) can be further estimated to become 

B_ j _m(x) < L • m(x) 4- e, m(x 0 ) = 0. (10.7) 

Formula (10.7) (or (10.5)) is what one calls a differential inequality. The question 
is: are we allowed to replace “<” by “ = ”, i.e., to solve instead of (10.7) the 
equation 

v! = Lu + e, u(x 0 ) = 0 (10.8) 

and to conclude that m(x) < u{x) ? This would mean, by the formulas of Section 
1.3 or 1.5, that 

m(x) < j- f L ( x ~ Xo '> — l). (10.9) 

We would thus have obtained (7.17) in a natural way and have furthermore discov¬ 
ered an elegant and powerful tool for many kinds of new estimates. 


The Fundamental Theorems 

A general theorem of the type 

D + m{x) < g(x, m{x)) ") 

D + u(x)>g(x,u(x)) > => m{x)<u(x) for x 0 < x (10.10) 

m(x 0 ) < u(x 0 ) ) 

cannot be true. Counter-examples are provided by any differential equation with 
non-unique solutions, such as 

„ x 2 

g(x,y) = \/y, m(x) = —, u(x)= o. (io.il) 

The important observation, due to Peano and Perron, which allows us to overcome 
this difficulty, is that one of the first two inequalities must be replaced by a strict 
inequality (see Peano (1890), §3, Lemme 1): 
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Theorem 10.1. Suppose that the functions m(x) and u(x) are continuous and 
satisfy for x 0 < x < X 

a) D + m(x) < g(x, m(x)) 

b) D + u(x) > g(x,u(x)) (10.12) 

c) m(x 0 ) < u(x 0 ). 


Then 


mn(x) < u(x) for x 0 < x < X. 
The same conclusion is true if both D + are replaced by D+. 


(10.13) 


Proof In order to be able to compare the derivatives D + m and D + u in (10.12), 
we consider points at which m(x) = u(x). This is the main idea. 

If (10.13) were not true, we could choose a point x 2 with m(x 2 ) > u{xf) and 
look for the first point x 1 to the left of x 2 with m(x 1 ) = u(x 1 ) . Then for small 
h > 0 we would have 

m(x 1 + h) — m(x 1 ) u(x 1 + h) — tx(x 1 ) 
h > h 

and, by taking limits, D + m(x 1 ) > D + u{xf) . This, however, contradicts (a) and 
(b), which give 

D + m(x l ) < g(x 1 , m(x 1 )) = g(x 1 ,u(x 1 )) < D+ufa). n 


Many variant forms of this theorem are possible, for example by using left Dini 
derivates (Walter 1970, Chap. II, §8, Theorem V). 

Theorem 10.2 (The “fundamental lemma”). Suppose that y{x) is a solution of 
the system of differential equations y' = /(x, y ), y(x Q ) = y 0 , and that v(x) is an 
approximate solution. If 

a ) « 

b) \\v\x + 0) - f(x,v(x))W<£ 

c) \\f(x,v) — f(x,y)\\ < L\\v — y\\, 
then, for x>x 0 ,we have the error estimate 

|j//(./•) - v(®) II < ge L(x ~ Xo) + i (e L ^ x - X0 ^ - l). (10.14) 

Remark. The two terms in (10.14) express, respectively, the influence of the error g 
in the initial values and the influence of the defect e to the error of the approximate 
solution. It implies that the error depends continuously on both, and that for g = 
e = 0 we have y{x) = v(x), i.e., uniqueness of the solution. 
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Proof. We put m(x) = || y(x) — v(x)\\ and obtain, as in (10.7), 

D+m(x) < L • m(x) + £, rn(x 0 ) < g. 

We shall try to compare this with the differential equation 

u' = Lu + e, u(x 0 ) = g. (10.15) 

Theorem 10.1 is not directly applicable. We therefore replace in (10.15) £ by 
e + t/, 7] > 0 and solve instead 

v! — Lu + £ + 7/ > Lu + £, lt(x 0 ) = £>. 

Now Theorem 10.1 gives the estimate (10.14) with £ replaced by £ + p. Since this 
estimate is true for all rj > 0, it is also true for ry = 0. □ 

Variant form of Theorem 10.2. The conditions 

a) \ v(x 0 )-y(x 0 )\\ <g 

b) ||t/(a: + 0) -f(x,v(x))\\ < 5(x) 

c) \\f(x,v) -f(x,y)\\ <£(x)\\v-y\\ 

imply for x > x 0 

\\y(x) — v(x) || < e L ^ (g + / e~ L ^S(s)ds\ L(x) 

^ Jx 0 

Proof. This is simply formula (3.3). □ 

Theorem 10.3. If the function g(x,y) is continuous and satisfies a Lipschitz 
condition, then the implication (10.10) is true for continuous functions m{x) and 
u(x). 

Proof. Define functions w n (x), v n (x) by 

w n( x ) = 9(x, w n (x)) + 1/n, w n (x 0 ) = m(x 0 ), 

v n( x ) = 9(x, v n (x)) - 1/n, V n (x 0 ) = u(x 0 ), 

so that from Theorem 10.1 

m(x)<w n (x ), v n (x)<u(x) for x 0 < x < X. (10.16) 

It follows from Theorem 10.2 that the functions w n (x) and v n (x) converge for 
n —> oo to the solutions of 

w'(x)=g(x,w(x)), w(x Q ) = m(x Q ), 

v'(x)=g(x,v(x)), v(x 0 ) = u(x 0 ), 
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since the defect is ±1 jn. Finally, because of m(x 0 ) < u(x 0 ) and uniqueness we 
have w(x) < v(pc). Taking the limit n —> oc in (10.16) thus gives m(x) < u(x ). 

□ 


A further generalization of Theorem 10.2 is possible if the Lipschitz condition 
(c) is replaced by something nonlinear such as 

\\f(x, v) - f(x, y )|| < u(x, ||u - 2 /||). 

Then the differential inequality for the error m(x) is to be compared with the 
solution of 

u = uj(x, u) + S(x) + 77 , u(x Q ) = g, T] > 0 . 

See Walter (1970), Chap. II, §11 for more details. 


Estimates Using One-Sided Lipschitz Conditions 


As we already observed in Exercise 2 of 1.9, and as has been known for a long time, 
much information about the errors can be lost by the use of positive Lipschitz con¬ 
stants L (e.g (9.11), (9.11’), or (9.11”)) in the estimates (7.16), (7.17), or (7.18). 
The estimates all grow exponentially with x , even if the solutions and errors de¬ 
cay. Therefore many efforts have been made to obtain better error estimates, as for 
example the papers Eltermann (1955), Uhlmann (1957), Dahlquist (1959), and the 
references therein. We follow with great pleasure the particularly clear presentation 
of Dahlquist. 

Let us estimate the derivative of m(x) = \\v(x) — y(x) || with more care than 
we did in (10.5): for h > 0 we have 


m(x + h) = \\v(x + h) — y(pc + h) || 

= \\v(x) -y(x) + h(v'(x + 0) -y\x))\\ +C7(/i 2 ) (10.17) 

< v(x) -y(x) + h[f(x,v(x)) -f{x,y(x))^j +hS(x) + 0(h 2 ) 

by the use of (10.6) and (9.7). Here, we apply the mean value theorem to the 
function y-\-hf(x,y) and obtain 

\l + h^-(x,rj) ) • mix) + hS(x) + 0(h 2 ) 
ov J 


m[ 


i(x + h) < ^ 

and finally for h > 0, 

m{x + h) — m{x) 


max 

r]e[y(x),v(x)\ 


< max 

h r)e[y(x),v(x)] h 

(10.18) 

The expression on the right hand side of (10.18) leads us to the following definition: 


dy 

\\i + h d / v M\\-i 


m(x) + S(x) + 0{h). 
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Definition 10.4. Let Q be a square matrix, then we call 

MQ)= lim l|/ + feQII ~ 1 (10.19) 

h->o,h>o h 

the logarithmic norm of Q. 

Here are formulas for its computation (Dahlquist (1959), p. 11, Eltermann 
(1955), p.498, 499): 

Theorem 10.5. The logarithmic norm (10.19) is obtained by the following formu¬ 
las: for the Euclidean norm (9.6), 

p(Q) = A max = largest eigenvalue of Q T + Q); (10.20) 

for the max-norm (9.6’), 

^ < 3) = fc ^ ax n (^ + Dl^l) ; (10.20’) 

for the norm (9.6”), 

MQ) = . m ax n (%i + J2 l^il) • (10.20”) 

Proofs. Formulas (10.20’) and (10.20”) follow quite trivially from (9.11’) and 
(9.11”) and the definition (10.19). The point is that the presence of / suppresses, 
for h sufficiently small, the absolute values for the diagonal elements. (10.20) is 
seen from the fact that the eigenvalues of 

(I + hQ) T (/ + hQ ) = I + h(Q T + Q ) + h?Q T Q, 

for h —> 0, converge to 1 + h\ , where A • are the eigenvalues of Q T + Q. □ 

Remark. For complex-valued matrices the above formulas remain valid if one re- 
places Q by Q* and q kk , q u by R eq kk , Req u . 

We now obtain from (10.18) the following improvement of Theorem 10.3. 

Theorem 10.6. Suppose that we have the estimates 

p J {^-(x : 77 )^ < £(x) for 77 E [y(x), v(x)\ and (10.21) 

\\v'(x + 0)-f(x,v(x))\\ < S(x), \\v(x 0 )-y(x 0 )\\ < q. 

Then for x > x 0 we have 

\\y(x) -v(x)\\ < e L ^ (q+ f e~ L ^5(s) ds\ (10.22) 

' J X Q 

with L(x) — £(s) ds. 
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Proof. Since, for a fixed x, the segment [v(x),y(x)] is compact, 


Ii = max max 

i [v(x),y(x)\ 


d A 

dyi 


is finite. Then (see the proof of Theorem 10.5) 


\\I + h$L(x, v )\\-l 
h 



where the 0{h) -term is uniformly bounded in r\. (For the norms (9.6’) and (9.6”) 
this term is in fact zero for h < 1/K). Thus the condition (10.21) inserted into 
(10.18) gives 

Dj r m{x) < £{x)m(x) + 5(x). 


Now the estimate (10.22) follows in the same way as that of Theorem 10.3. 

□ 


Exercises 


1. Apply Theorem 10.6 to the example of Exercise 2 of 1.9. Observe the substan¬ 
tial improvement of the estimates. 

2. Prove the following (a variant form of the famous “Gronwall lemma”, Gron- 
wall 1919): suppose that a positive function m{x) satisfies 

r 

m{x) < q + e(x — x 0 ) + L / m(s)ds=:w(x ) (10.23) 

Jx 0 

then 

m( x) < ge L ^~ Xo) + (e L ( x ~ x ° ) - l); (10.24) 

a) directly, by subtracting from (10.23) 

r 

u(x) = q + e(x — x 0 ) + L / u(s) ds; 

d X o 

b) by differentiating w(x) in (10.23) and using Theorem 10.1. 

c) Prove Theorem 10.2 with the help of the above lemma of Gronwall. The 
same interrelations are, of course, also valid in more general situations. 

3. Consider the problem y' = Xy, 2/(0) = 1 with A > 0 and apply Euler’s method 
with constant step size h=l/n. Prove that 

—^ Vhi x ) < D+y h (x) < A y h (x) 
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and derive the estimate 

/ A\ n x / A\ n + A 

(H—) <e A <(H—) for A > 0. 

V nJ \ nJ 

4. Prove the following properties of the logarithmic norm: 

a) n(aQ) = oi/i(Q) for a > 0 

b) jQ||</i(Q)<IIQII 

c) + < n(Q) +[i(P), /i^f Q(t) dtj < f/i(Q(t))dt 

d) |/i(Q)-/i(P)|< ||Q-P| 

5. For the Euclidean norm (10.20), /i(Q) is the smallest number satisfying 

(v,Qv) <ju(Q) ||t;|| 2 . 

This property is valid for all norms associated with a scalar product. Prove this. 

6. Show that for the Euclidean norm the condition (10.21) is equivalent to 

(y-zj(x,y) - f(x,z)) < £(x)\\y - z\\ 2 . 

7. Observe, using an example of the form 

y'i = y2> y f 2 = ~yn 

that a generalization of Theorem 10.1 to systems of first order differential 
equations, with inequalities interpreted component-wise, is not true in general 
(Muller 1926). 

However, it is possible to prove such a generalization of Theorem 10.1 under 
the additional hypothesis that the functions g^x, ..., y n ) are quasimono¬ 
tone , i.e., that 

9i{x, y 1 ,...,y j ,...,y n )< g^x, y 1 ,..., z p ..., y n ) 
if y- < Zj for all j^i. 

Try to prove this. 

An important fact is that many systems from parabolic differential equations, 
such as equation (6.10), are quasimonotone. This allows many interesting ap¬ 
plications of the ideas of this section (see Walter (1970), Chap. IV). 
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[Wronski] ... beschaftigte sich mit Mathematik, Mechanik und 
Physik, Himmelsmechanik und Astronomie, Statistik und politis- 
cher Okonomie, mit Geschichte, Politik und Philosophie, ... er 
versuchte seine Krafte in mehreren mechanischen und technischen 
Erfindungen. (S. Dickstein, III. Math. Kongr. 1904, p. 515) 


With more knowledge about existence and uniqueness, and with more skill in lin¬ 
ear algebra, we shall now, as did the mathematicians of the 19th century, better 
understand many points which had been left somewhat obscure in Sections 1.4 and 
1.6 about linear differential equations of higher order. 

Equation (4.9) divided by a n {x) (which is ^ 0 away from singular points) 
becomes 

y {n) +b n _ 1 (x)y( n ~ 1) + ... + b 0 (x)y = g(x), b^x) = a^x)/a n {x). (11.1) 

with g(x) = f{x)/a n (x ). Introducing y = y l9 y r = y 2 ,..., yi 71 - 1 ) = y n we arrive 
at 


y A 


( 0 

1 

\ 


/ y i\ 


( 0 \ 

2/2 ' 


0 

0 



2/2 

1 


• I 




i 



\ 

0 

'Vn' 


K~b 0 ( x ) 

-hffx) .. 



M In' 


\9(x)J 


We again denote by y the vector (t / l5 ..., y n ) T and by f{x) the inhomogeneity, 
so that (11.1’) becomes a special case of the following system of linear differential 
equations 

y' = A(x)y + f (x), (11.2) 

A ( x ) = f( x ) = (fi( x )), i,j = l,...,n. 

Here, the theorems of Section 1.9 and 1.10 apply without difficulty. Since the partial 
derivatives of the right hand side of (11.2) with respect to y i are given by a ki (x ), 
we have the Lipschitz estimate (see condition (c) of the variant form of Theorem 
10.2), where t{x) = ||A(x)|| in any subordinate matrix norm (9.11, 11’, 11”). We 
apply Theorem 7.4, and the variant form of Theorem 10.2 with v{x) = 0 as “ap¬ 
proximate solution”. We may also take t(x) = jli(A(x)) (see (10.20, 20’, 20”)) and 
apply Theorem 10.6. 

Theorem 11.1. Suppose that A(x) is continuous on an interval [x 0 ,X\. Then for 
any initial values y 0 = (y 10 ,..., y n0 ) T there exists for all x 0 < x < X a unique 
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solution of (11.2) satisfying 

\\y(x)\\<e L ^(\y 0 \\ + J\- L ^\\f(s)\\ds) (11.3) 

L(x) = f £(s) ds , £(x) = \\A(x)\\ or £(x) = p(A(x)). 

lx o 

For f(x) = 0, 2/(x) depends linearly on the initial values, i.e., there is a matrix 
R(x,xf) (the “resolvent”), such that 

y(x) = R(x,x 0 )y 0 . (11.4) 

Proof Since t(x) is continuous and therefore bounded on any compact interval 
[x 0 ,X], the estimate (11.3) shows that the solutions can be continued until the end. 
The linear dependence follows from the fact that, for / = 0, linear combinations 
of solutions are again solutions, and from uniqueness. □ 


Resolvent and Wronskian 


From uniqueness we have that the solutions with initial values y 0 at x 0 and y x = 
R(x l9 xf) y 0 at x x (see (11.4)) must be the same. Hence we have 

H(x 2 ,x 0 ) = R{x 2 ,x 1 )R(x 1 ,x 0 ) (11.5) 

for x 0 < x 1 < x 2 . Finally by integrating backward from x 1 ,y l9 i.e., by the co¬ 
ordinate transformation x = x 1 —t, 0 < t < x 1 — x 0 , we must arrive, again by 
uniqueness, at the starting values. Hence 

R(x 0 ,x 1 )= ^R(x 1 ,x 0 ) S ) (11.6) 

and (11.5) is true without any restriction on x Q ,x l 9 x 2 . 

Let y^x) = (y u (x ),..., y ni (x)) T (for i = 1, ..., n) be a set of n solutions 
of the homogeneous differential equation 

y' = A(x)y (11.7) 


which are linearly independent at x = x 0 (i.e., they form a fundamental system). 
We form the Wronskian matrix (Wronski 1810) 


W(x) 


yu(x) 

\y n iW 


Vl n( X ) 


y nn ( x ) 
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so that 

W'(x) = A(x)W(x) 

and all solutions can be written as 

Ci2/i (x) + ... + c n y n (x) = W(x) c where c = (c l5 ..., cJ T . (11.8) 

If this solution must satisfy the initial conditions y(x 0 ) = y 0 , we obtain 
c = W~ 1 (x 0 )y 0 and we have the formula 

R(x,x 0 ) = W(x)W-\x 0 ). (11.9) 

Therefore all solutions are known if one has found n linearly independent solu¬ 
tions. 


Inhomogeneous Linear Equations 


Extending the idea of Joh. Bernoulli for (3.2) and Lagrange for (4.9), we now com¬ 
pute the solutions of the inhomogeneous equation (11.2) by letting c be “variable” 
in the “general solution” (11.8): y{x) = W(x)c(x) (Liouville 1838). Exactly as in 
Section 1.3 for (3.2) we obtain from (11.2) and (11.7) by differentiation 

y f = W'c + Wc' = AWc + VEc' = AWc + /. 

Hence d — W~ x f. If we integrate this with integration constants c, we obtain 

y(x) = W(x) f W~ 1 (s)f(s)ds + W(x)c. 

Jx 0 

The initial conditions y(x 0 ) M y 0 imply c = W~ 1 (x 0 )y 0 and we obtain: 


Theorem 11.2 (“Variation of constants formula”). Let A{x) and f(x) be contin¬ 
uous. Then the solution of the inhomogeneous equation y' = A(x)y + f(x) satis¬ 
fying the initial conditions y(x 0 ) = y 0 is given by 


y(x) 


W(x) (w-\x 0 ) y 0 + f X W-\s)f(s ) da) 

v 7cc 0 ' 


R( x , x o)yo+ / R(x,s)f(s)ds. 
Jx 0 


( 11 . 10 ) 


The Abel-Liouville-Jacobi-Ostrogradskii Identity 

We already know from (11.6) that W(x) remains regular for all x. We now show 
that the determinant of W(x) can be given explicitly as follows (Abel 1827, Liou¬ 
ville 1838, Jacobi 1845, §17): 
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det(W(x)) = det(VL(x 0 )) • expf f tr (A(s)) ds\ (11.11) 

'Jx 0 ' 

tr (^(a:)) = a u (x) + a 22 (x) +... + a nn (x) 

which connects the determinant of W(x) to the trace of A{pc ). 

For the proof of (11.11) (see also Exercise 2) we compute the derivative 
^ det (W(x)) . Since det(W(x)) is multilinear, this derivative (by the Leibniz 
rule) is a sum of n terms, whose first is 



/y'u 

y 2 i 

y[ 2 
y 22 

T x = det 


'Vn 1 

y n 2 

{x)y u + ... 

+ a l n 

( X )Vni 


Vl 7 
V2i 


\ 


) 


We insert y' u — u JllK ^ J y li -r . • ^ 12 v^;^ 2 z’ 

..., a ln (x)t/ ni disappear by subtracting multiples of lines 2 to n, so that T x = 
a n (x) det(W(x )). Summing all these terms we obtain finally 

fdet(W(x)) = (a n (a:) +... + a nn (x)) ■ det(W (#)) (11.12) 

and (11.11) follows by integration. □ 


Exercises 

1. Compute the resolvent matrix R(x, x 0 ) for the two systems 

y[ =Vi y[ = y 2 

y 2 = 3 y 2 y 2 = -Vi 

and check the validity of (11.5), (11.6) as well as (11.11). 

2. Reconstruct Abel’s original proof for (11.11), which was for the case 

y'\ + py[ + qy I = 0, y 2 + py 2 + qy 2 = 0. 

Multiply the equations by y 2 and y x respectively and subtract to eliminate q . 
Then integrate. 

Use the result to obtain an identity for the two integrals 

roo poo 2 

y 1 (a)= / e ax ~ x x^dx, y 2 (a)= / 

J 0 J 0 

which both satisfy 

d 2 y i a dy i a 
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Hint. To verify (11.13), integrate from 0 to infinity the expression for 
^(exp [ax — x 2 )x a ) (Abel 1827, case IV). 


3. (Kummer 1839). Show that the general solution of the equation 

y( n \x) = x rn y(x) (11.14) 

can be obtained by quadrature. 

Hint. Differentiate (11.14) to obtain 

y {n+1) =x m y ' + mx m ~ 1 y. (11.15) 


Suppose by recursion that the general solution of 


v> (n+l) 


„m —1 




i.e., 


d n + 1 
dx n+1 


2 p(xu) = x rn 1 u rn+n ^{xu) (11.16) 


is already known. Show that then 

y(x) = J u rn ~ 1 expe¬ 


rt 


ra+n 


m-\-n 


^j^p(xu) dx 


is the general solution of (11.15), and, under some conditions on the parame¬ 
ters, also of (11.14). To simplify the computations, consider the function 


/ 7 y m+n x 

g(u) = u m exp(- — )ip(xu), 

V m + n/ 

compute its derivative with respect to u , multiply by x m_1 , and integrate from 
0 to infinity. 


4. (Weak singularities for systems). Show that the linear system 

y 1 = ^A 0 + A 1 x + A 2 x 2 + . ..^jy (11.17) 

possesses solutions of the form 

y(x) = x q + v x x + v 2 x 2 + .. (11.18) 

where v 0 , v x ,... are vectors. Determine first q and v 0 , then recursively v 1 , v 2 , 
etc. Observe that there exist n independent solutions of the form (11.18) if the 
eigenvalues of A 0 satisfy A- ^ X- mod (Z) (Fuchs 1866). 

5. Find the general solution of the weakly singular systems 

y' = -( i * 1 ) y and y' = -( 4 i \ )y- (H-19) 

V T ~4/ x \~i - 1 / 

Hint. While the first is easy from Exercise 4, the second needs an additional 
idea (see formula (5.9)). A second possibility is to use the transformation 
x = e t , y(x) = z(t) , and apply the methods of Section 1.12. 
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Die Technik der Integration der linearen Differentialgleichungen 
mit constanten Coeffizienten wird hier auf das Hochste entwickelt. 

(F. Klein in Routh 1898) 


Linearization 

Systems of linear differential equations with constant coefficients form a class of 
equations for which the resolvent R(x,x 0 ) can be computed explicitly. They gen¬ 
erally occur by linearization of time-independent (i.e., autonomous or permanent) 
nonlinear differential equations 

y'i = or y'i = fi(yi,---,y n ) (12.1) 

in the neighbourhood of a stationary point (Lagrange (1788), see also Routh (1860), 
Chap. IX, Thomson & Tait 1879). We choose the coordinates so that the stationary 
point under consideration is the origin, i.e., /•(0,..., 0) = 0. We then expand /• 
in its Taylor series and neglect all nonlinear terms: 

Ti n Tl r\ n 

" 9; ' = S^ <C,K ' ,m,) 

This is a system of equations with constant coefficients, as introduced in Section 
1.6 (see (6.4), (6.11)), 

y' = Ay or y" = Ay. (12.1”) 

Autonomous systems are invariant under a shift x —> x + C . We may therefore 

always assume that x 0 = 0. For arbitrary x 0 the resolvent is given by 

R(x , xf) = R(x — x 0 , 0). (12.2) 


Diagonalization 

We have seen in Section 1.6 that the assumption y(x) = v • e Xx leads to 

Av = Xv or Av = X 2 v , (12.3) 

hence v 0 must be an eigenvector of A and A the corresponding eigenvalue (in 
the first case; a square root of the eigenvalue in the second case, which we do not 
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consider any longer). From (12.3) we obtain by subtraction that there exists such a 
v 0 if and only if the determinant 

XaW ~ det(A/-,1) = (A-A,)(A - A 2 )... (A - AJ = 0. (12.4) 

This determinant is called the characteristic polynomial of A . 

Suppose now that for the n eigenvalues A • the n eigenvectors v i can be cho¬ 
sen linearly independent. We then have from (12.3) 

a(vi,v 2 ,..., vjj = (^,^ 2 ,,..,^) diag^A 1; A 2 ,..., A n ), 

or, if T is the matrix whose columns are the eigenvectors of A, 

T~ 1 AT= diag(A 1 ,A 2 ,...,A n ). (12.5) 

On comparing (12.5) with (12.1”), we see that the differential equation simplifies 
considerably if we use the coordinate transformation 

y(x) = Tz( x), y'(x)=Tz'(x) (12.6) 

which leads to 

z\x)= diag (^X 1 , A 2 ,.., t A n J z(x ). (12.7) 

Thus the original system of differential equations decomposes into n single equa¬ 
tions which are readily integrated to give 

z(x) = diag^exp(A 1 x), exp(A 2 x),..., exp(A n x)^z 0 , 

from which (12.6), yields 

y(x) = T diag^exp(A 1 a;), exp(A 2 a;),..., exp(A n x)^T _1 %. (12.8) 


The Schur Decomposition 

Der Beweis ist leicht zu erbringen. (Schur 1909) 

The foregoing theory, beautiful as it may appear, has several drawbacks: 

a) Not all n x n matrices have a set of n linearly independent eigenvectors; 

b) Even if it is invertible, the matrix T can behave very badly (see Exercise 1). 
However, for symmetric matrices a classical theory tells that A can always be di¬ 
agonalized by orthogonal transformations. Let us therefore, with Schur (1909), 
extend this classical theory to non-symmetric matrices. A real matrix Q is called 
orthogonal if its column vectors are mutually orthogonal and of norm 1, i.e., if 
Q T Q = I or Q T = Q -1 . A complex matrix Q is called unitary if Q*Q = I or 
Q* = Q -1 , where Q* is the adjoint matrix of Q , i.e., transposed and complex 
conjugate. 
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Theorem 12.1. a) (Schur 1909). For each complex matrix A there exists a unitary 


matrix Q such that 

/ A x x x 

.. XX 



A 2 x 

X 


Q*AQ = 

V 

0 

; (12.9) 


b) (Wintner & Murnaghan 1931). For a real matrix A the matrix Q can be 
chosen real and orthogonal, if for each pair of conjugate eigenvalues A, A = a d= i/3 
one allows the block 


to be replaced by 


A x 
A 



Proof a) The matrix A has at least one eigenvector with eigenvalue A 1 . We use 
this (normalized) vector as the first column of a matrix Q 1 . Its other columns are 
then chosen by arbitrarily completing the first one to an orthonormal basis. Then 


( A, I x ... x \ 

-A* / (1210) 

We then apply the same argument to the (n — 1)-dimensional matrix A 2 . This 
leads to 


A 2 Q 2 



With the unitary matrix 


Q 2 — 


( l 

0 ) 

Vo 

Q 2 / 


we obtain 


q\aq x q 2 



A continuation of this process leads finally to a triangular matrix as in (12.9) with 
Q Q 1 Q 2 ' ' ' Qn—l • 

b) Suppose A to be a real matrix. If A x is real, Q x can be chosen real and 
orthogonal. Now let A x = a + i/3 (/? ^ 0) be a non-real eigenvalue with a corre¬ 
sponding eigenvector u-\-iv, i.e., 


A(u ± iv) = (a ± i/3) (u d= iv) 


( 12 . 11 ) 


or 


Au = au — f3v , 


Av — (3u A olv 


( 12 . 11 ’) 
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Since (3 ^ 0, u and v are linearly independent. We choose an orthogonal basis u, 
v of the subspace spanned by u and v and take u, v as the first two columns of 
the orthogonal matrix Q { . We then have from (12.11’) 


1 

( X 

X 

X 

.. X \ 

Q? 

II 

C? 

X 

X 

X . , 



0 | A 3 J □ 


Schur himself was not very proud of “his” decomposition, he just derived it as 
a tool for proving interesting properties of eigenvalues (see e.g., Exercise 2). 

Clearly, if A is real and symmetric, Q T AQ will also be symmetric, and there¬ 
fore diagonal (see also Exercise 3). 


Numerical Computations 


The above theoretical proof is still not of much practical use. It requires that one 
know the eigenvalues, but the computation of eigenvalues from the characteristic 
polynomial is one of the best-known stupidities of numerical analysis. Good nu¬ 
merical analysis turns it the other way round: the real matrix A is directly reduced, 
first to Hessenberg form, and then by a sequence of orthogonal transformations to 
the real Schur form of Wintner & Murnaghan (“QR-algorithm” of Francis, coded 
by Martin, Peters & Wilkinson, contribution 11/14 in Wilkinson & Reinsch 1970). 
The eigenvalues then drop out. However, the produced code, called “HQR2”, does 
not give the Schur form of A , since it continues for the eigenvectors of A . Some 
manipulations must therefore be done to interrupt the code at the right moment 
(in the FORTRAN translation HQR2 of Eispack (1974), for example, the “340” of 
statement labelled “60” has to be replaced by “1001”). Happy “Matlab”-users just 
call “SCHUR”. 

Whenever the Schur form has been obtained, the transformation y{pc) = Qz(pc ), 
y'(x) = Qz f (x ) (see (12.6)) leads to 


/ z'l \ (\ b 12 ••• Kn -1 h ln \ ( Z 1 \ 


V TJ 


1 b n- 


n—l,n 

K ) 


n—1 


( 12 . 12 ) 


V 1 / 


The last equation of this system is z' n = A n z n , and it can be integrated to give 
z n = exp (A n x)z n0 . Next, the equation for z n _ 1 is 


Z h-1 = K-l Z n-l + K-l,n Z n ( 12 . 12 ’) 

with z n known. This is a linear equation (inhomogeneous, if b n _ 1 n ^ 0) which 
can be solved by Euler’s technique (Section 1.4). Two different cases arise: 
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a) If A n-1 7 ^ A ra we put z n _ 1 = Eexp(A n _ 1 x) + Fexp(A n x), insert into 
(12.12’) and compare coefficients. This gives F = b n _ 1 n z n0 j(X n — A n _ 1 ) 
and E = z n 

b) If A n _! = A n we set z n _ 1 = (E + Fx) exp(A n ;r) and obtain F = b n _ l n z n0 
and E = z n _ 10 . 

The next stage, following the same ideas, gives z n _ 2 , etc. Simple recursive 
formulas for the elements of the resolvent, which work in the case A • ^ \- , are 
obtained as follows (Parlett 1976): we assume 

n 

z i( x ) = J2 E ij ex P ( X j x ) (12.13) 

j=i 

and insert this into (12.12). After comparing coefficients, we obtain for i — n , 
n — 1 , n — 2 , etc. 


^ ^ — A-( Wik), 

k i j—i- 1-1 
n 

^iO ^ ^ ^ij ' 

j—i +1 


k — i H - 1 , i H - 2 ,... 


(12.13’) 


The Jordan Canonical Form 


Simpler Than You Thought 
(Amer. Math. Monthly 87 (1980) Nr. 9) 


Whenever one is not afraid of badly conditioned matrices (see Exercise 1), and 
many mathematicians are not, the Schur form obtained above can be further trans¬ 
formed into the famous Jordan canonical form: 


Theorem 12.2 (Jordan 1870, Livre deuxieme, §5 and 6 ). For every matrix A there 
exists a non-singular matrix T such that 



(12.14) 


(The dimensions (> 1) of the blocks may vary and the A • are not necessarily dis¬ 
tinct). 


Proof We may suppose that the matrix is already in the Schur form. This is of 
course possible in such a way that identical eigenvalues are grouped together on 
the principal diagonal. 
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The next step (see Fletcher & Sorensen 1983) is to remove all nonzero elements 
outside the upper-triangular blocks containing identical eigenvalues. We let 


A = 



C 

D 


where B and D are upper-triangular. The diagonal elements of B are all equal to 
X 1 , whereas those of D are A 2 , A 3 ,... and all different from . We search for a 
matrix S such that 


B 

0 






0 

D 


or, equivalently, 


BS + C = SD. 


(12.15) 


From this relation the matrix S can be computed column-wise as follows: the first 
column of (12.15) is BS 1 -\-C 1 = X 2 S 1 (here S- and C- denote the j th column 
of S and C , respectively) which yields S 1 because A 2 is not an eigenvalue of B . 
The second column of (12.15) yields BS 2 + C 2 = \ 3 S 2 + d 12 S 1 and allows us to 
compute S 2 , etc. 

In the following steps we treat each of the remaining blocks separately: we thus 
assume that all diagonal elements are equal to A and transform the block recur¬ 
sively to the form stated in the theorem. Since (A — A I) n = 0 (n is the dimension 
of the matrix A ) there exists an integer k (1 <k<n) such that 

(. A-\I) k = 0 , (i-A/f'Vo. (12.16) 

We fix a vector v such that (A — XI) k ~ 1 v ^ 0 and put 

v j = {A- XI) k ~ J v, j = l,...,k 

so that 

Av x = Xv x , Avj = Xvj + v-_ x for j = 2 ,..., k. 

The vectors v ± ,..., v k are linearly independent, because a multiplication of the 
expression c j v j = 0 — A I) k ~ x yields c k — 0, then a multiplication 

with (A — XI) k ~ 2 yields c k _ 1 = 0, etc. As in the proof of the Schur decomposition 
(Theorem 12.1) we complete v x ,.. * t v k to a basis of C n in such a way that (with 

V = (v 1 ,...,v n )) 

A 1 

ll \k (12.17) 

where D is upper-triangular with A on its diagonal. 

Our next aim is to eliminate the nonzero elements of C in (12.17). In analogy 
to (12.15) it is natural to search for a matrix S such that JS + C = SD . Unfortu¬ 
nately, such an S does not always exist because the eigenvalues of J and of D are 


AV = V 


J C 
0 D 
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the same. However, it is possible to find S such that all elements of C are removed 
with the exception of its last line, i.e., 


e k c± 

D 


(12.18) 


or equivalently 

JS + C = e k c T + SD, 

where e k = (0,..., 0,1) T and c T = (c 1? ..., c n _ k ). This can be seen as follows: 
the first column of this relation becomes (J — XI)S 1 + C 1 = c x e k . Its last com¬ 
ponent yields c x and the other components determine the 2nd to k th elements of 
S 1 . The first element of S 1 can arbitrarily be put equal to zero. Then we compute 
S 2 from (J — XI)S 2 + C 2 = c 2 e k + d 12 S 1 , etc. We thus obtain a matrix S' (with 
vanishing first line) such that (12.18) holds. 

We finally show that the assumption (A — A I) k = 0 implies c = 0 in (12.18). 
Indeed, a simple calculation yields 

(J-XI e k c T \ k = f 0 

d — \i) Voo^ 

where the first row of C is equal to the row-vector c T . 

We have thus transformed A to block-diagonal form with blocks J of (12.17) 
and D . The procedure can now be repeated with the lower-dimensional matrix D . 
The product of all the occurring transformation matrices is then the matrix T in 
(12.14). □ 


Corollary 12.3. For every matrix A and for every number e 0 there exists a 
non-singular matrix T (depending on e) such that 


T~ 1 AT= diag 


f X 1 e 


\ -A 



(12.14’) 


Proof Multiply equation (12.14) from the right by D — diag (1, £, £ 2 , £ 3 ,...) and 
from the left by D~ x . □ 


Numerical difficulties in determining the Jordan canonical form are described 
in Golub & Wilkinson (1976). There exist also several computer programs, for 
example the one described in Kagstrom & Ruhe (1980). 

When the matrix A has been transformed to Jordan canonical form (12.14), 
the solutions of the differential equation y' = Ay can be calculated by the method 
explained in ( 12 . 12 ’), case b): 


y(x) = TDT 1 y 0 


(12.19) 
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where D is a block-diagonal matrix with blocks of the form 



This is an extension of formula (12.8). 
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Geometric Representation 


The geometric shapes of the solution curves of y' = Ay are presented in Fig. 12.1 
for dimension n = 2. They are plotted as paths in the phase-space (t/ l5 y 2 ). The 
cases a), b), c) and e) are the linearized equations of (12.20) at the four critical 
points (see Fig. 12.2). 

Much of this structure remains valid also for nonlinear systems (12.1) in the 
neighbourhood of equilibrium points. Exceptions may be “structurally unstable” 
cases such as complex eigenvalues with a = Re (A) = 0. This has been the subject 
of many papers discussing “critical points” or “singularities” (see e.g., the famous 
treatise of Poincare (1881, 82, 85)). 

In Fig. 12.2 we show solutions of the quadratic system 

y'i = \bji -y 2 )( 1 -yi -2/2) ( 1220 ) 

2/2 = 2/1(2-%) 

which possesses four critical points of all four possible structurally stable types 
(Exercise 4). 



Fig. 12.2. Solution flow of System (12.20) 
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Exercises 

1. a) Compute the eigenvectors of the matrix 

/- 1 20 \ 

-2 20 

-3 20 

A = . (12.21) 

-19 20 

V -20/ 

by solving (A — A i I)v i m 0. 

Result. v ± = (1, 0,.. .) T , v 2 m (1, -1/20, 0,.. .) T , v 3 = (1, -2/20, 2/400, 
0,.. .) T , v 4 = (1, -3/20,6/400, -6/8000,0,.. .) T , etc. 

b) Compute numerically the inverse of T — (v x , v 2: ..., v n ) and determine 
its largest element (answer: 4.5 x 10 12 ). The matrix T is thus very badly 
conditioned. 

c) Compute numerically or analytically from (12.13) the solutions of 

y' = Ay, y.( 0) = 1, i = 1,..., 20. (12.22) 

Observe the “hump” (Moler & Van Loan 1978): although all eigenvalues 
of A are negative, the solutions first grow enormously before decaying to 
zero. This is typical of non-symmetric matrices and is connected with the 
bad condition of T (see Fig. 12.3). 

Result. 



Fig. 12.3. Solutions of equation (12.22) with matrix (12.21) 




1.12 Systems with Constant Coefficients 79 


2. (Schur). Prove that the eigenvalues of a matrix A satisfy the estimate 

n n 

Eki 2 < E k/ 

i— 1 z,j = l 

and that equality holds iff A is orthogonally diagonalizable (see also Exer¬ 
cise 3). 

Hint. J2i j \ a ij\ 2 i s ^e trace of A*A and thus invariant under unitary trans¬ 
formations Q*AQ. 

3. Show that the Schur decomposition S' = Q*AQ is diagonal iff A*A = A A* . 
Such matrices are called normal. Examples are symmetric and skew-sym¬ 
metric matrices. 

Hint. The condition is equivalent to S*S = SS* . 

4. Compute the four critical points of System (12.20), and for each of these points 
the eigenvalues and eigenvectors of the matrix df/dy. Compare the results 
with Figs. 12.2 and 12.1. 


5. Compute a Schur decomposition and the Jordan canonical form of the matrix 



4 

20 

4 



Result. The Jordan canonical form is 



6 . Reduce the matrices 


/A 1 
A 


b 

1 

A 


\r 


/ A 1 b 
A 0 
A 


c 

d 

1 

A 


to Jordan canonical form. In the second case distinguish the possibilities b + 
d = 0 and b + d 7 ^ 0 . 



1.13 Stability 


The Examiners give notice that the following is the subject of 
the Prize to be adjudged in 1877: The Criterion of Dynamical 
Stability. (S.G. Phear 

(Vice-Chancellor), J. Challis, G.G. Stokes, J. Clerk Maxwell) 


Introduction 

“To illustrate the meaning of the question imagine a particle to slide down inside a 
smooth inclined cylinder along the lowest generating line, or to slide down outside 
along the highest generating line. In the former case a slight derangement of the 
motion would merely cause the particle to oscillate about the generating line, while 
in the latter case the particle would depart from the generating line altogether. The 
motion in the former case would be, in the sense of the question, stable, in the latter 
unstable ... what is desired is, a corresponding condition enabling us to decide 
when a dynamically possible motion of a system is such, that if slightly deranged 
the motion shall continue to be only slightly departed from.” (“The Examiners” in 
Routh 1877). 

Whenever no analytical solution of a problem is known, numerical solutions 
can only be obtained for specified initial values. But often one needs information 
about the stability behaviour of the solutions for ah initial values in the neighbour¬ 
hood of a certain equilibrium point. We again transfer the equilibrium point to the 
origin and define: 

Definition 13.1. Let 

y'i= i = 1,... ,n (13.1) 

be a system with /•(0,..., 0) = 0, i = 1,..., n. Then the origin is called stable in 
the sense of Liapunov if for any e > 0 there is a 5 > 0 such that for the solutions, 
|| y(x 0 ) || < S implies || y(pc) || < e for ah x > x 0 . 

The first step, taken by Routh in his famous Adams Prize essay (Routh 1877), 
was to study the linearized equation 

y'i = it a ijyj > = (13 - 2) 

j —1 ^ 

(“The quantities x, y, z ,... etc are said to be small when their squares can be ne¬ 
glected”) From the general solution of (13.2) obtained in Section 1.12, we imme¬ 
diately have 
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Theorem 13.1. The linearized equation (13.2) is stable (in the sense of Liapunov) 
iff all roots of the characteristic equation 

det(A7 — A) = clqX 71 -\~ a^X n T • • • u n _^X -\~ cl u = 0 (13.3) 

satisfy Re (A) < 0, and the multiple roots, which give rise to Jordan chains, satisfy 
the strict inequality Re (A) <0. 

Proof See (12.12) and (12.19). For Jordan chains the “secular” term (e.g., E + Fx 
in the solution of (12.12), case (b)) which tends to infinity for increasing x, must 
be “killed” by an exponential with strictly negative exponent. □ 


The Routh-Hurwitz Criterion 

The next task, which leads to the famous Routh-Hurwitz criterion, was the verifica¬ 
tion of the conditions Re (A) < 0 directly from the coefficients of (13.3), without 
computing the roots. To solve this problem, Routh combined two known ideas: 
the first was Cauchy’s argument principle, saying that the number of roots of a 
polynomial p(z) = u(z) + iv(z) inside a closed contour is equal to the number of 
(positive) rotations of the vector (u(z), v(z)), as z travels along the boundary in 
the positive sense (see e.g., Henrici (1974), p. 276). An example is presented in 
Fig. 13.1 for the polynomial 

+ 6z 5 + 1 6z 4 + 25z 3 + 24z 2 + 14z + 4 

9 9 (13.4) 

= (z + 1) (z + 2) (z -\-z-\-l)(z -\-2z-\-2). 

On the half-circle 2 = Re i0 (tt/2 <0 <37 t/2 , R very large) the argument of p(z ), 
due to the dominant term z n , makes nj2 positive rotations. In order to have all 
zeros of p in the negative half plane, we therefore need an additional n/2 positive 
rotations along the imaginary axis: 

Lemma 13.2. Let p(z) be a polynomial of degree n and suppose that p(iy) 0 for 
y G R. Then all roots of p(z) are in the negative half-plane iff, along the imaginary 
axis, arg (p(iy)) makes n/2 positive rotations for y from —00 to +00. □ 


The second idea was the use of Sturm’s theorem (Sturm 1829) which had its 
origin in Euclid’s algorithm for polynomials. Sturm made the discovery that in 
the division of the polynomial p i _ 1 (y) by p^y) it is better to take the remainder 
p i+ 1 (y) with negative sign 

Pi-i(y) =Pi(y)Qi(y) -p i+ i(y)- (13.5) 

Then, due to the “Sturm sequence property” 

Sign (p i+1 (y)) i- sign (p^y)) if p^y) = 0, 


(13.6) 
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Fig. 13.1. Vector field of arg (p(z )) for the polynomial p(z) of (13.4) 
the number of sign changes 

w(y) = No. of sign changes of (p 0 (y ), p^y ),.. .,p m (y)) (13.7) 

does not vary at the zeros of (y)... . ,p rn _ 1 (y). A consequence is the following 

Lemma 13.3. Suppose that a sequence Po(y),Pi(y), • • • ,P m (y) °f rea lpolynomi¬ 
als satisfies 

i) deg(p 0 ) >deg(p 1 ), 

ii) p Q (y) and p 1 (y) not simultaneously zero, 

m) p m (y) ^ 0 f ° ral1 y G 

iv) and the Sturm sequence property (13.6). 

Then 

w(oo) -w(-oo) (13 8) 

is equal to the number of rotations, measured in the positive direction, of the vector 
(Poiv) i Piiv)) as y tends from — oo to +oo. 

Proof. Due to the Sturm sequence property, w(y) does not change at zeros of 
Pi(y), • • • ,P m -i(y)- By assumption (iii) also p m (y) has no influence. There¬ 
fore w(y) can change only at zeros of p 0 (y). If w(y) increases by one at y. 
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either p 0 (y) changes from + to — and p 1 (y) >0 or it changes from — to + 
and p 1 (y) < 0 (p 1 (y) = 0 is impossible by (ii)). In both situations the vector 
(jPo(y)iPi(y)) crosses the imaginary axis in the positive direction (see Fig. 13.2). 
If w(y) decreases by one, (p 0 (y), Pi(y)) crosses the imaginary axis in the nega¬ 
tive direction. The result now follows from (i), since the vector (p 0 (y),p^y)) is 
horizontal for y —> — oc and for y —> +oo. □ 


N 

Pi 

/ 

p 1 

Pi 

X 

‘ Pi 

9 , 


p o / 

Z 

\ Po 

Po 


+ + 0 — — — — 0 + + + + 0 — — — — 0 + + Pq 

+ + + + + ----- _____ + + + + + p x 


Fig. 13.2. Rotations of (po(y),pi(y)) compared to w(y) 


The two preceding lemmas together give us the desired criterion for stability: 
let the characteristic polynomial (13.3) 

p(z) = a 0 z n + a 1 z n ~ 1 +... + a n = 0, a 0 > 0 

be given. We divide p(iy ) by i" and separate real and imaginary parts, 

Po(y) = Re = a 0 y n - a 2 y n ~ 2 + a 4 y n ~ A ±... 

1 , (13.9) 

Pi(y) = -Im ^ = a lV n ~ l - a 3 y n ~ 3 + a 5 y n ~ 5 ± .... 

Due to the special structure of these polynomials, the Euclidean algorithm (13.5) is 
here particularly simple: we write 

Pi(y) = c i0 y n ~ l + C n y n ~ l ~ 2 + c i2 y n -'~ 4 + ..., (13.10) 


and have for the quotient in (13.5) q^y) # (c i _ 1 0 /c i0 )y, provided that c i0 ^ 0. 
Now (13.10) inserted into (13.5) gives the following recursive formulas for the 
computation of the coefficients c 7 J : 


C i+ 1,J _ C i,3 +1 


C i- 1,0 


C i0 


c i-l,j+l 




c i-l,j+l A _ 
C iJ +1 ) 


(13.11) 


If c- 0 = 0 for some i, the quotient q^y) is a higher degree polynomial and the 
Euclidean algorithm stops at p m (y) with m <n. 

The sequence (p^y)) obtained in this way obviously satisfies conditions (i) 
and (iv) of Lemma 13.3. Condition (ii) is equivalent to p(iy) ^0 for y E 1R, and 
(iii) is a consequence of (ii) since p m (y) is the greatest common divisor of p 0 (y) 
and p^y). 
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Theorem 13.4 (Routh 1877). All roots of the real polynomial (13.3) with a 0 > 0 
lie in the negative half plane Re A < 0 if and only if 

c- 0 > 0 for i = 0,1,2,..., n. (13.12) 

Remark. Due to the condition c- 0 > 0, the division by c- 0 in formula (13.11) can 
be omitted (common positive factor of p i+1 (y) ), which leads to the same theorem 
(Routh (1877), p. 27: “... so that by remembering this simple cross-multiplication 
we may write down ...”). This, however, is not advisable for n large because of 
possible overflow. 

Proof. The coordinate systems (p 0 >-Pi) an d (Re (p), Im (p)) are of opposite orien¬ 
tation. Therefore, n/2 positive rotations of p(iy) correspond to n/2 negative rota¬ 
tions of (p 0 ( y ), p 1 ( y )). If all roots of p( A) lie in the negative half plane Re A < 0, 
it follows from Lemmas 13.2 and 13.3 that w( oo) — w(— oo) = —n, which is only 
possible if w( oo) = 0, w(— oo) = n. This implies the positivity of all leading 
coefficients of p-(p). 

On the other hand, if (13.12) is satisfied, we see that p n (y) tf c n0 . Hence the 
polynomials p 0 (y) and p 1 (y) cannot have a common factor and p(A) ^0 on the 
imaginary axis. We can now apply Lemmas 13.2 and 13.3 again to obtain the result. 

□ 


Table 13.1. 

Routh tableau for (13.4) 



CO 

II 

•o* 

CM 

II 

•o* 

II 

•o* 

O 

II 

•04 

i = 0 

1 -16 24 -4 

i = 1 

6 -25 14 

i = 2 

11.83 -21.67 4 

i = 3 

14.01 -11.97 

i = 4 

11.56 -4 

i = 5 

7.12 

i = 6 

4 


Table 13.2. 

Routh tableau for (13.13) 


3 = 0 

CM 

II 

II 

i = 0 

1 

-q s 

2 = 1 

P 

—r 

i = 2 

pq — r 

—ps 

i = 3 

(pq — r)r —p 2 s 


i = 4 

((pq — r)r — p 2 s)ps 



Example 1. The Routh tableau (13.11) for equation (13.4) is given in Table 13.1. 
It clearly satisfies the conditions for stability. 

Example 2 (Routh 1877, p. 27). Express the stability conditions for the biquadratic 

z 4 +pz 3 + qz 2 + rz + s = 0. (13.13) 

The c- values (without division) are given in Table 13.2. We have stability iff 

p > 0, pq — r> 0, (pq — r)r — p 2 s > 0, s > 0. 
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Computational Considerations 


The actual computational use of Routh’s criterion, in spite of its high historical 
importance and mathematical elegance, has two drawbacks for higher dimensions: 

1) It is not easy to compute the characteristic polynomial for higher order matri¬ 
ces; 

2) The use of the characteristic polynomial is very dangerous in the presence of 
rounding errors. 

So, whenever one is not working with exact algebra or high precision, it is 
advisable to avoid the characteristic polynomial and use numerically stable algo¬ 
rithms for the eigenvalue problem (e.g., Eispack 1974). 


Numerical experiments. 1. The 2 n x 2 n dimensional matrix 



has the characteristic polynomial 

n 

p(z) = ]J(z 2 +0.lz + j 2 + 0.0025). 
j =i 

We computed the coefficients of p using double precision, and then applied the 
Routh algorithm in single precision (machine precision = 6xl0 -8 ). The results 
indicated stability for n < 15, but not for n > 16, although the matrix always has 
its eigenvalues —0.05 ±ki in the negative half plane. On the other hand, a direct 
computation of the eigenvalues of A with the use of Eispack subroutines gave no 
problem for any n . 

2. We also tested the Routh algorithm at the (scaled) numerators of the diago¬ 
nal Pade approximations to exp(z) 


- / \ 

1 +- ( „ 2 ) 


n(n— 1) ( nz ) 2 

(2n)(2n—1) 2! 


n(n—l)(n—2) (nz) 3 

(2n)(2n—l)(2n—2) 3! 


+ ..., ( 13 . 14 ) 


which are also known to possess all zeros in C - . Here, the results were correct 
only for n < 21, and wrong for larger n due to rounding errors. 
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Liapunov Functions 

We now consider the question whether the stability of the nonlinear system (13.1) 
“can really be determined by examination of the terms of the first order only” 
(Routh 1877, Chapt. VII). This theory, initiated by Routh and Poincare, was 
brought to perfection in the famous work of Liapunov (1892). As a general ref¬ 
erence to the enormous theory that has developed in the meantime we mention 
Rouche, Habets & Laloy (1977) and W. Hahn (1967). 

Liapunov’s (and Routh’s) main tools are the so-called Liapunov functions 
V(y 1 ,..., y n ) , which should satisfy 

V(y 1 ,...,y n )> 0, 

V(y 1 ,...,y n ) = 0 iff y 1 = ... = y n = 0 (13.15) 

and along the solutions of (13.1) 

f.V(y 1 (x),.. .,y n (x)) <0. (13.16) 

Usually V(y) behaves quadratically for small y and condition (13.15) means that 

c\\y\\ 2 <V{y)<C\\y\\\ C > c > 0. (13.17) 

The existence of such a Liapunov function is then a sufficient condition for stability 
of the origin. 

We start with the construction of a Liapunov function for the linear case 

y' = Ay. (13.18) 

This is best done in the basis which is naturally given by the eigenvectors (or Jordan 
chains) of A . We therefore introduce y = Tz , z = T~ x y, so that A is transformed 
to Jordan canonical form (12.14’) J = T~ X AT and (13.18) becomes 

z' = Jz. (13.19) 

If we put 

V 0 (z) = \\z\\ 2 and V(y) = V 0 (T~ 1 y)=V 0 (z), (13.20) 

the derivative of V(y(x)) becomes 

V(y(x)) = Vn ( z(x )) = 2Re (z(x),z f (x)) 

dx }) dx oV v )} \ v v )/ (13.21) 

= 2Re (z(x), Jz{x)) < 2 p(J)V(y(x)). 

By (10.20) the logarithmic norm is given by 

2/i( J) = largest eigenvalue of (J + J*). 




1.13 Stability 87 


The matrix J + J* is block-diagonal with tridiagonal blocks 
2 Re A t 

£ 

V 

Subtracting the diagonal and using formula (6.7a), we see that the eigenvalues of 
the m -dimensional matrix (13.22) are given by 

2 ( Re A ? - + e cos- V fc = 1,..., ra. (13.23) 

V m + 1/ 

As a consequence of this formula or by the use of Exercise 4 we have: 

Lemma 13.5. If all eigenvalues of A satisfy Re < — q < 0, then there exists a 
(quadratic) Liapunov function for equation (13.18) which satisfies 

fv(y{x))<~eV{y{x)). (13.24) 

□ 

This last differential inequality implies that (Theorem 10.1) 

V(y(x)) < V(y 0 ) ■ exp(-y(x - x 0 )) 
and ensures that lim^^ ||t/(x)| = 0, i.e., asymptotic stability. 

Stability of Nonlinear Systems 

It is now easy to extend the same ideas to nonlinear equations. The following 
theorem is an example of such a result. 

Theorem 13.6. Let the nonlinear system 

y' = Ay + g{x,y) (13.25) 

be given with Re A- < — q < 0 for all eigenvalues of A. Further suppose that for 
each e > 0 there is a 8 > 0 such that 

\\g{x,y)\\ <e\\y\\ for \\y\\ < S, x > x 0 . (13.26) 

Then the origin is (asymptotically) stable in the sense of Liapunov. 

Proof. We use the Liapunov function V (y) constructed for Lemma 13.5 and obtain 
from (13.25) 

f v (y( x )) < -e V (y ( x )) + 2 Re T- X g[x, y(x))). (13.27) 


£ 

2 Re A, 


(13.22) 


e 2 Re A i / 
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Cauchy’s inequality together with (13.26) yields 

^r(y(z)) < (-Q+ ||T|| • IIT- 1 !^) • V(y(x)). (13.28) 

For sufficiently small e the right hand side is negative and we obtain asymptotic 
stability. □ 


We see that, for nonlinear systems, stability is only assured in a neighbourhood 
of the origin. This can also be observed in Fig. 12.2. Another difference is that the 
stability for eigenvalues on the imaginary axis can be destroyed. An example for 
this (Routh 1877, pp. 95-96) is the system 

Vi = v2’> y2 = ~yi J[ ~y2’ ( 13 . 29 ) 

Here, with the Liapunov function V = +y%)/2, we obtain V' — y% which is 

> 0 for y 2 / 0. Therefore all solutions with initial value / 0 increase. A survey 
of this question (“the center problem”) together with its connection to limit cycles 
is given in Wanner (1983). 


Stability of Non-Autonomous Systems 


When the coefficients are not constant, 

y' = A(x)y, (13.30) 

it is not a sufficient test of stability that the eigenvalues of A satisfy the conditions 
of stability for each instantaneous value of x . 

Examples. 1. (Routh 1877, p. 96). 

v’lmy 2 , y ' 2 = ~^ Vl (13.31) 

which is satisfied by y 1 (x) = a^/x. 

2. An example with eigenvalues strictly negative: we start with 



An inspection of the derivative of V = (yf -h y%)/2 shows that V increases in the 
sector 2 — <y 2 /yi < 2 + a/ 3 . The idea is to take the initial value in this region 
and, for x increasing, to rotate the coordinate system with the same speed as the 
solution rotates: 


y' = T(x)BT(-x)y = A(x)y, T(x) = 


cos ax — sm ax 
sin ax cos ax 


(13.32) 
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For 2/(0) = (1,1) T , a good choice for a is a = 2 and (13.32) possesses the solution 

y(x) = ^(cos2x — sin 2x)e x , (cos 2x + sin 2x)e x ^J . (13.33) 

This solution is clearly unstable, while —1 remains for all x the double eigenvalue 
of A(x). For more examples see Exercises 6 and 7 below. 

We observe that stability theory for non-autonomous systems is more compli¬ 
cated. Among the cases in which stability can be shown are the following: 

1) a u (x) < 0 and A{pc) is diagonally dominant; then p(A(x)) < 0 such that sta¬ 
bility follows from Theorem 10.6. 

2) A(x) = B + C(x), with B constant and satisfying Re V < — £<o for its eigen¬ 
values, and ||C f (x)|| < £ with £ so small that the proof of Theorem 13.6 can be 
applied. 

Exercises 

1. Express the stability conditions for the polynomials z 2 + pz + q = 0 and z 3 + 

pz 2 + qz + r = 0. 

Result, a) p > 0 and q> 0;b) p> 0, r > 0 and pq — r > 0. 

2. (Hurwitz 1895). Verify that condition (13.12) is equivalent to the positivity of 
the principal minors of the matrix 



( a k = 0 for k < 0 and k > n). Understand that Routh’s algorithm (13.11) is 
identical to a sort of Gaussian elimination transforming H to triangular form. 

3. The polynomial 

5 • 4 • 3 • 2 • 1 z 5 5 • 4 • 3 • 2 z 4 5-4-3 z 3 5 ■ 4 z 2 5 ~ 

10-9-8-7-65[ + 10-9-8-74[ + 10-9-83[ + 10-92[ + 10 Z + 
is the numerator of the (5, 5)-Pade approximation to exp(z). Verify that all 
its roots satisfy Rez < 0. Try to establish the result for general n (see e.g., 
Birkhoff & Varga (1965), Lemma 7). 

4. (Gerschgorin). Prove that the eigenvalues of a matrix A = (a -) lie in the union 
of the discs 

[z-, \z — a u \ < 






90 I. Classical Mathematical Theory 


Hint. Write the formula Ax = Xx in coordinates a ij x j = ^ x i » put the di¬ 
agonal elements on the right hand side and choose i such that \x i | is maximal. 

5. Determine the stability of the origin for the system 

y'i = -V 2 ~y\~ yiV 2 > 

y , 2 =y 1 + 2y 1 y 2 . 

Hint. Find a Liapunov function of degree 4 starting with 
V = (yf + 2/§)/2 +... such that V' = K(y\ + ?/|) 2 + ... and determine the 
sign of K. 

6. (J. Lambert 1987). Consider the system 

y' = M x ) • y where A ( x ) = ( _Y / 4 4 X[j 4x ) • (13.34) 

a) Show that both eigenvalues of A{pc) satisfy Re A < 0 for all x > 0. 

b) Compute fi(A) (from (10.20)) and show that 

H(A)< 0 iff Vh-l <x< Vb + l. 

c) Compute the general solution of (13.34). 

Hint. Introduce the new functions z 2 (x) = y 2 (x), z 1 (x)= xy 1 (x) which leads 
to the second equation of (11.19) (Exercise 5 of Section 1.11). The solution is 

Vi 0 x ) = x ~ 3/4 {a + b log x), y 2 (x) = x 1/A (-1 + b (1 - 1 log a:)). 

(13.35) 

d) Determine a and b such that ||y(x)||| is increasing for 0 < x < s/h — 1. 

e) Determine a and b such that \\y(x) ||| is increasing for y/b + 1 < x < oo. 
Results, b = 1.8116035 • a for (d) and b = 0.2462015 • a for (e). 

7. Find a counter-example for Fatou’s conjecture 

If y + A(t)y = 0 and V t 0 < C x < A(t) < C 2 then t/(t) is stable 
(C.R. 189 (1929), p.967-969; for a solution see Perron (1930)). 

8. Help James Watt (see original drawing from 1788 in Fig. 13.3) to solve the 
stability problem for his steam engine governor: if uj is the rotation speed of 
the engine, its acceleration is influenced by the steam supply and exterior work 
as follows: 

uJ = k cos((^ + a) — F, fc, F > 0. 

Here a is a fixed angle and ip describes the motion of the governor. The 
acceleration of ip is determined by centrifugal force, weight, and friction as 

ip" = J 1 sin(^cos(£ — gs\mp — b(p\ g, b > 0. 
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Compute the equilibrium point p" = p' = cj' = 0 and determine under which 
conditions it is stable (the solution is easier for a = 0). 

Correct solutions should be sent to: James Watt, famous inventor of the steam 
engine, Westminster Abbey, 6HQ 1FX London. 

Remark. Hurwitz’ paper (1895) was motivated by a similar practical problem, 
namely . die Regulirung von Turbinen des Badeortes Davos”. 





Fig. 13.3. James Watt’s steam engine governor 



1.14 Derivatives with Respect to Parameters 
and Initial Values 


For a single equation, Dr. Ritt has solved the problem indicated in 
the title by a very simple and direct method ... Dr. Ritt’s proof 
cannot be extended immediately to a system of equations. 

(T.H. Gronwall 1919) 


In this section we consider the question whether the solutions of differential equa¬ 
tions are differentiable 

a) with respect to the initial values; 

b) with respect to constant parameters in the equation; 

and how these derivatives can be computed. Both questions are, of course, of 
extreme importance: once a solution has been computed (numerically) for given 
initial values, one often wants to know how small changes of these initial values 
affect the solutions. This question arises e.g. if some initial values are not known 
exactly and must be determined from other conditions, such as prescribed boundary 
values. Also, the initial values may contain errors, and the effect of these errors has 
to be studied. The same problems arise for unknown or wrong constant parameters 
in the defining equations. 

Problems (a) and (b) are equivalent: let 

y' = f(x,y,p), y(x 0 )=y 0 (14.1) 

be a system of differential equations containing a parameter p (or several parame¬ 
ters). We can add this parameter to the solutions 

(;£M /(i, o ,rt )• 


so that the parameter becomes an initial value for p' — 0. Conversely, for a differ¬ 
ential system 


y' = f(x,y), 

y(x o) = y 0 

(14.2) 

we can write y(x) = z(x) + y Q and obtain 



II 

o 

+ 

ii 

z,y 0 ), 2 (^ 0 ) = 

(14.2’) 


so that the initial value has become a parameter. Therefore, of the two problems (a) 
and (b), we start with (b) (as did Gronwall), because it seems simpler to us. 
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The Derivative with Respect to a Parameter 


Usually, a given problem contains several parameters. But since we are interested 
in partial derivatives, we can treat one parameter after another while keeping the 
remaining ones fixed. It is therefore sufficient in the following theory to suppose 
that f(x,y,p) depends only on one scalar parameter p. 

When we replace the parameter p in (14.1) by q we obtain another solution, 
which we denote by z(x ): 

z' = f(x,z,q), z(x 0 )=y 0 . (14.3) 

It is then natural to subtract (14.1) from (14.3) and to linearize 

z' -y' = f(x,z,q) - f(x,y,p) (14.4) 

Of Of 

= ^{x, y ,p){ z - y ) + — ( x , y ,p){q-p) + e 1 -{z- y ) + g 2 . (q-p). 

If we put (z(x) — y(x))/(q — p) = if(x) and drop the error terms, we obtain 
Of Of 

= - 7 ^(x,y(x),p)i/> + - 7 ^(x,y(x),p), ip(x o )=0. (14.5) 

This equation is the key to the problem. 

Theorem 14.1 (Gronwall 1919). Suppose that for x 0 <x < X the partial deriva¬ 
tives Of /Oy and Of /Op exist and are continuous in the neighbourhood of the 
solution y(x). Then the partial derivatives 

dy{x) . . 

sr = 

exist, are continuous, and satisfy the differential equation (14.5). 


Proof. This theorem was the origin of the famous Gronwall lemma (see 1.10, Exer¬ 
cise 2). We prove it here by the equivalent Theorem 10.2. Set 


: max 


df 


dy 


A = max 


df 


Op 


(14.6) 


where the max is taken over the domain under consideration. When we consider 
z(x) as an approximate solution for (14.1) we have for the defect 

\\z\x) -f(x,z(x),p) II = \\f(x,z(x),q) - f(x,z(x),p)\\ < A\q-p\, 

therefore from Theorem 10.2 


A 


z{x)-y{x)\\<-\q-p\ (e^-o)-l). 


L 


(14.7) 


So for \q — p\ sufficiently small and x 0 < x < X, we can have || z(x) —y(x) || 
arbitrarily small. By definition of differentiability and by (14.7), for each e > 0 
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there is a S such that the error terms in (14.4) satisfy 


\\Qi-(,z-y) + Q2-(q-p)\\ <c\q-p\ if \q-p\<S. (14.8) 


(The situation is, in fact, a little more complicated: the S for the bounds \\g 1 1| < e 
and \\g 2 \\ < e may depend on x. But due to compactness and continuity, it can 
then be replaced by a uniform bound. Another possibility to overcome this little 
obstacle would be a bound on the second derivatives. But why should we worry 
about this detail? Gronwall himself did not mention it). 

We now consider (z(x) — y{x))/(q — p ) as an approximate solution for (14.5) 
and apply Theorem 10.2 a second time. Its defect is by (14.8) and (14.4) bounded 
by e and the linear differential equation (14.5) also has L as a Lipschitz constant 
(see (11.2)). Therefore from (10.14) we obtain 


z{x)-y{x) 

q-p 


ip(x) 


< _ ( e L (. x ~ x 0 ) 


1 ) 


which becomes arbitrarily small; this proves that ip(x) is the derivative of y(x) 
with respect to p. 


Continuity. The partial derivatives dy /dp = ip(x) are solutions of the differential 
equation (14.5), which we write as f>' = g(x, f>,p ), where by hypothesis g depends 
continuously on p . Therefore the continuous dependence of ip on p follows again 
from Theorem 10.2. □ 


Theorem 14.2. Let y{x) be the solution of equation (14.1) and consider the Jaco¬ 
bian 

df 

a( x ) = ~f( x ,y( x ),p)- ( 14 -9) 

Let R(x,x 0 ) be the resolvent of the equation y' = A(x)y (see (11.4)). Then the 
solution z(x) of (14.3) with a slightly perturbed parameter q is given by 

r x Qf 

z(x) = y(x) + (q -p) J R(x,s)-^(s,y(s),p)ds + o(\q-p\) (14.10) 

Proof. This is the variation of constants formula (11.10) applied to (14.5). □ 


It can be seen that the sensitivity of the solutions to changes of parameters is 
influenced firstly by the partial derivatives df /dp (which is natural), and secondly 
by the size of R(x, s ), i.e., by the stability of the differential equation with matrix 
(14.9). 
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Derivatives with Respect to Initial Values 


Notation. We denote by y(x,x 0 ,y 0 ) the solution y(x) at the point x satisfying 
the initial values y(x 0 ) = y 0 , and hope that no confusion arises from the use of the 
same letter y for two different functions. 

The following identities are trivial by definition or follow from uniqueness 
arguments as for (11.6): 

dy ^ X, g x °’ y °) = f{x,y(x,x 0 ,y 0 )) (14.11) 

y(x 0 ,x 0 ,y 0 )=y 0 (14.12) 

y{x 2 , x 1 ,y(x 1 ,x 0 , y 0 )) = y(x 2 , x 0 , y 0 ) . (14.13) 


Theorem 14.3. Suppose that the partial derivative of f with respect to y exists 
and is continuous. Then the solution y(x,x 0 ,y 0 ) is differentiable with respect to 
y 0 and the derivative is given by the matrix 


9y 0 


= V(x) 


where ^f(x) is the resolvent of the so-called “variational equation y 

Of 

V'(x) = (x, y(x, x 0 , y 0 )) ■ V(x), 

*(x 0 ) = L 


(14.14) 


(14.15) 


Proof. We know from (14.2) and (14.2’) that dF/dz and dF/dy 0 are both equal 
to df/dy, so the derivatives are known to exist by Theorem 14.1. In order to ob¬ 
tain formula (14.15), we just have to differentiate (14.11) and (14.12) with respect 
to %. □ 


We finally compute the derivative of y(x,x 0 ,y 0 ) with respect to x 0 . 


Theorem 14.4. Under the same hypothesis as in Theorem 14.3, the solutions are 
also differentiable with respect to x 0 and the derivative is given by 


dy(x,x 0 ,y 0 ) _ dy(x,x 0 ,y 0 ) ^ 

dx 0 d Vo 


(14.16) 


Proof. Differentiate the identity 

y(x 1 ,x 0 ,y(x 0 ,x 1 ,y 1 )) =y x , 

which follows from (14.13), with respect to x 0 and apply (14.11) (see Exercise 1). 

□ 
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The Nonlinear Variation-of-Constants Formula 


The following theorem is an extension of Theorem 11.2 to systems of non-linear 
differential equations. 

Theorem 14.5 (Alekseev 1961, Grobner 1960). Denote by y and z the solutions 
of 

y' = f(x,y), y(x 0 )=y 0 , (14.17a) 

z' = f(x,z)+g(x,z), z(x 0 ) = y 0 , (14.17b) 

respectively and suppose that df /dy exists and is continuous. Then the solutions 
of (14.17a) and of the “perturbed” equation (14.17b) are connected by 

r x Qy 

z(x) = y(x)+ -Zf(x,s,z(s))-g(s,z(s))ds. (14.18) 

Jx o 

Proof. We choose a subdivision x 0 = s 0 < s 1 < s 2 < ... < s N = x (see Fig. 14.1). 
The descending curves represent the solutions of the unperturbed equation (14.17a) 
with initial values s-, z(s i ). The differences d i are, due to the different slopes of 
z(s) and y(s) ((14.17b) minus (14.17a)), equal to d i =g(s i: z(s i )) • As - + o(As i ). 
This “error” at s • is then “transported” to the final value x by the amount given in 
Theorem 14.3, to give 

D i = jr-{ x i s n z ( s i)) ■g{ s i^z(s i )) ■ As i + o(As i ). (14.19) 

Since z(x) — y(x) = D i: we obtain the integral in (14.18) after insertion of 
(14.19) and passing to the limit As • —> 0 . □ 



Fig. 14.1. Lady Windermere’s fan, Act 2 
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If we also want to take into account a possible difference in the initial values, 
we may formulate: 

Corollary 14.6. Let y(x) and 

y' = f(x, 
z' = f{x, 

then 

z(x) = y{x) + [ 

Jo a Uo 

+ [ x Jy. 

Jx o dy 0 

These two theorems allow many estimates of the stability of general nonlinear 
systems. For linear systems, dy/dy 0 (x , s, z) is independent of z, and formulas 
(14.20) and (14.18) become the variation-of-constants formula (11.10). Also, by 
majorizing the integrals in (14.20) in a trivial way, one obtains the fundamental 
lemma (10.14) and also the variant form of Theorem 10.2. 


z(x) be the solutions of 

y ) , y(x 0 ) = y 0 , 

z) +g(x,z), z(x 0 )=z 0 , 


[x, x 0 , y 0 + s(z 0 - y 0 )) ■ (z 0 - y 0 ) ds 
(x,s,z(s)^ ■g(s,z(s))ds. 


(14.20) 

□ 


Flows and Volume-Preserving Flows 

Considerons des molecules fluides dont V ensemble forme a l’ori- 
gine des temps une certaine figure Fo; quand ces molecules se 
deplaceront, leur ensemble formera une nouvelle figure qui ira en 
se deformant d’une maniere continue, et a V instant t V ensemble 
des molecules envisagees formera une nouvelle figure F . 

(H. Poincare, Mecanique Celeste 1899, Tome III, p.2) 

We now turn our attention to a new interpretation of the Abel-Liouville-Jacobi- 
Ostrogradskii formula (11.11). Liouville and above all Jacobi (in his “Dynamik” 
1843) used this formula extensively to obtain “first integrals”, i.e., relations be¬ 
tween the solutions, so that the dimension of the system could be decreased and 
the analytic integration of the differential equations of mechanics becomes a little 
less hopeless. Poincare then (see the quotation) introduced a much more geometric 
point of view: for an autonomous system of differential equations 1 

f = /(!-) (14.2D 

we define the flow (p t :R n ^ R n to be the function which associates, for a given 
t , to the initial value y° E M n the corresponding solution value at time t 

<Pt(y°) := 2 /(i, 0 ,y°). (14.22) 

1 Due to the origin of these topics in Mechanics and Astronomy, we here use t for the 
independent variable. 
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For sets A of initial values we also study its behaviour under the action of the flow 
and write 

<p t ( A ) = {y\y = y(t, o, y°), y° gA} . (u.2T) 

We can imagine, with Poincare, sets of “molecules” moving (and being deformed) 
with the flow. 

Example 14.7. Fig. 14.2 shows, for the two-dimensional system (12.20) (see 
Fig. 12.2), the transformations which three sets A,B,C 2 undergo when t passes 
from 0 to 0.2, 0.4 and (for C) 0.6. It can be observed that these sets quickly lose 
very much of their beauty. 



Fig. 14.2. Transformation of three sets under a flow 


Now divide A into “infinitely small” cubes I of sides dy ®,..., cfa/O . The 
image ip t (I) of such a cube is an infinitely small parallelepiped. It is created by 
the columns of dy/dy°(t,0,y°) scaled by dy ®, and its volume is 
det (dy/dy°(t, 0, y 0 )) • dy® ... dy® . Adding up all these volumes (over A) or, 
more precisely, using the transformation formula for multiple integrals 


2 The resemblance of these sets with a certain feline animal is not entirely accidental; we 
chose it in honour of V.I. Arnol’d. 
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(Euler 1769b, Jacobi 1841), we obtain 
Vol 


(vM)) = L m dy = /J de '(0 (( -°- 


'(ft (A) J A 1 

Next we use formula (11.11) together with (14.15) 

rt 


det (^(*’ 0,yO )) =exp (/ tr 7 , ( J/ ( s ’°’ J/0 ))) ds ) 


(14.23) 


and we obtain 


Theorem 14.8. Consider the system (14.21) with continuously differentiable func- 
tion f(y). 

a) For a set Ad M n the total volume of tp t (A) satisfies 

Vol (>p t (A)) = J ex p (J tr (f'(y(s, 0, y °))) ds) dy°. (14.24) 

b) If tr ( f'(y )) = 0 along the solution, the flow is volume-preserving, i.e., 

Vol (<p t (A)) = Vol (A). □ 


Example 14.9. For the system (12.20) we have 

f(y)=(^ 1 ^ V 2 - t/j 1 ^ 3 ) and tr (/'(y)) = (! — s^i)/ 3 - 

The trace of f'(y ) changes sign at the line y 1 = 1/5. To its left the volume in¬ 
creases, to the right we have decreasing volumes. This can clearly be seen in 
Fig. 14.2. 

Example 14.10. For the mathematical pendulum (with y 1 the angle of deviation 
from the vertical) 


yi=y 2 

y 2 = - sin y 1 


f\y) 


0 M 

- cos y 1 0 J 


(14.25) 


we have tr (f(y)) = 0. Therefore the flow, although treating the cats quite badly, 
at least preserves their areas (Fig. 14.3). 
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Canonical Equations and Symplectic Mappings 


Let H(p 1 ,... ,p n , g l5 ..., q n ) be a twice continuously differentiable function of 
2 n variables and (see (6.26)) 


dH ( ^ 
Pi = (P, Q), 

dQi 


• dH ( ^ 
q ‘ = W, (M> 


(14.26) 


the corresponding canonical system of differential equations. Small variations of 
the initial values lead to variations Spflt), Sqflt) of the solution of (14.26). By 
Theorem 14.3 (variational equation) these satisfy 


S Pi 


-{p,q)-6 Pj 
d Pj d <li 3 




d 2 H 


P, 0(Jjdq t 


(p, q) • Sq i 


d 2 H 




d 2 H 


(14.27) 


(p, q)-fiqj- 


P dPjdPi 1 p{ dQjdPi 

The upper left block of the Jacobian matrix is the negative transposed of the lower 
right block. As a consequence, the trace of the Jacobian of (14.27) is identically 
zero and the corresponding flow is volume-preserving (“Theorem of Liouville”). 

But there is much more than that (Poincare 1899, vol. Ill, p. 43): consider a 
two-dimensional manifold A in the 2n -dimensional flow. We represent it as a 
(differentiable) map of a compact set K cl 2 into R 2n (Fig. 14.4) 

$ : K —-> A C M 2n 

(u,v) I—> (p°(u,v),q°(u,v)) 


(14.28) 
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We let 7 t-(A) be the projection of A onto the (p-, g-)-coordinate plane and con¬ 
sider the sum of the oriented areas of tt^A) . We shall see that this is also an 
invariant. 



Fig. 14.4. Two-dimensional manifold in the flow 


The oriented area of 7 t-(A) is a surface integral over A which is defined, with 
the transformation formula in mind, as 


or. area [tt, 


(A)) = 


du dv . 


(14.29) 


For the computation of the area of 7r- (p t (A)), after the action of the flow, we use 
the composition ip t o <f> as coordinate map (Fig. 14.4). This produces, with p\, q\ 
being the ith respectively (n-K) th component of this map, 

/M M\ 

or.area(7r i (y> t (A))) =//det d \ f \dudv. (14.30) 


du 

dv 

dqj 

du 

dqi 

dv 


There is no theoretical difficulty in differentiating this expression with respect to t 
and summing for i = 1,..., n . This will give zero and the invariance is established. 

The proof, however, becomes more elegant if we introduce exterior differential 
forms (E. Cartan 1899). These, originally “expressions purement symboliques”, 
are today understood as multilinear maps on the tangent space (for more details 
see “Chapter 7” of Arnol’d 1974). In our case the one-forms (ip -, respectively dq i , 
map a tangent vector £ to its i th, respectively (n+i)th, component. The exterior 
product dp { A dq i is a bilinear map acting on a pair of vectors 


(dp i Adq i )(£ 1 ,£ 2 ) = det 


= clet (ff'\ 

\<ki(Z i) 

= dp i (Qdq i (Q 


dPiiC 2 A 
dqi(Z 2 )) 

~ dp^dq^ i) 


(14.31) 
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and satisfies Grassmann’s rules for exterior multiplication 

dp i A dp- = —dp - A dp i , dp { A dp { = 0 . (14.32) 

For the two tangent vectors (see Fig. 14.4) 

=& (u ' v> ’ ■ ■ ■ ■ §!<"' ">• ■■■' t^ [v " v) ) t 

( ° lA ”>• ■■■• i£ in ’ v) ) T 

the expression (14.31) is precisely the integrand of (14.29). If we introduce the 
differential 2-form 


(14.33) 


uo 


= J2 d Pi A dq i 


(14.34) 




then our candidate for invariance becomes 


^or.area(?r ^A)) = ff u 2 (£,$) dudv. 
i=i A 


After the action of the flow we have the tangent vectors 

£=^(p°,«°k?, £=^(pVk2° 

and 

^or.area(7 x^^A))) = JJ w 2 (^,^) dudv 

i=1 K 


(see (14.30)). We shall see that <^ 2 (^, ££) = ^ 2 (Ci ? £ 2 ) • 

Definition 14.11. For a differentiable function g : M 2n —> M 2n we define the dif¬ 
ferential form g*u 2 by 

(ff*v 2 )(£i,$ 2 ) :=w 2 (y(p,g)^ 1 ,y(p,^ 2 ) • (14.35) 

Such a function g is called symplectic (a name suggested by H. Weyl 1939, p. 165) 
if 

g*uj 2 =L 0 2 , (14.36) 


i.e., if the 2-form uS 1 is invariant under g. 


Theorem 14.12. The flow of a canonical system (14.26) is symplectic, i.e., 

(<p t )*io 2 =u 2 for all t. (14.37) 


Proof. We compute the derivative of cc 2 (£i, £ 2 ) ( see (14.35)) with respect to t by 
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the Leibniz rule. This gives 


rtC52( d Pi Ad Qi)(t 1 ^ 2 )) ^2( d Pi A + ^(dPi A , £>) • 


(14.38) 


Since the vectors and Q satisfy the variational equation (14.27), we have 


7 n 


^ / <9 2 tf , , d 2 H , 

22{-WM dp i Adq '-d^ dq ^ dqi 

i,j= 1 J 1 J 1 


(14.39) 


dpjdpi 


dPi A dpj + -2-L- dp* A dqj ) (^, ^)• 


The first and last terms in this formula cancel by symmetry of the partial derivatives. 
Further, the properties (14.32) imply that 

v- d 2H , x , , v-/ d 2 H , . <9 2 i7 , N \ 7 

2- ap^T «> dp ‘ A ' Ai = L ( 8^ 1 1) - A «>)*< A *j 

Z,j = l z J 


i<j 


vanishes. Since the last remaining term cancels in the same way, the derivative 
(14.38) vanishes identically. □ 


Example 14.13. We use the spherical pendulum in canonical form (6.28) 


2 COS#-, 

Pi =P 2 — 3 - 8m( h 

sm q 1 

Qi =Pi 


P2 

Q 2 


0 

P2 

sin 2 q 1 


(14.40) 


and for A the familiar two-dimensional cat placed in M 4 such that its projection 
to (p 1 ,q 1 ) is a line; i.e., with zero area. It can be seen that with increasing t the 
area in (p 1 ,q 1 ) increases and the area in (p 2 , (fe) decreases. Their sum remains 
constant. Observe that for larger t the left ear in (p 1 , q x ) is twisted, i.e., surrounded 
in the negative sense, so that this part counts for negative area (Fig. 14.5). If time 
proceeded in the negative sense, both areas would increase, but the first area would 
be oriented negatively. 


Between the two-dimensional invariant of Theorem 14.12 and the 2n-dimen¬ 
sional of Liouville’s theorem, there are many others; e.g., the differential 4-form 

a; 4 = ^ dp i A dpj A dq { A dq ■. (14.41) 

i<j 

These invariants, however, are not really new, because (14.41) is proportional to 
the exterior square of a; 2 , a; 2 A a; 2 = —2a; 4 . 
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Writing (14.31) in matrix notation 

with J= (-°/ ()) ( 14 - 42 > 

we obtain the following criterion: 

Theorem 14.14. A differentiable transformation g : R 2n —> R 2n is symplectic if 
and only if its Jacobian R = g'(p , q) satisfies 

R t JR = J (14.43) 

with J given in (14.42). 

Proof. This follows at once from (see (14.35)) 

(5*^ 2 )te,£ 2 ) = (r^) t j(rq=&r t jrz 2 - □ 


Exercises 

1. Prove the following lemma from elementary calculus which is used in the proof 
of Theorem 14.4: if for a function F(x, y ), dF/dy exists and y(pc) is differ¬ 
entiable and such that F(x,y(x)) = Const , then dF/dx exists at (x,y(x)) 
and is equal to 

r)F BF 

— (x,y(x)) = - — (x,y{x))-y'(x). 

Hint. Use the identity 

F(x 1 ,y(xi)) - F(x 0 , = F(x 0 , y(x 0 )) - F(x 0 , yfa)). 




1.15 Boundary Value and Eigenvalue Problems 


Although our book is mainly concerned with initial value problems, we want to 
include in this first chapter some properties of boundary and eigenvalue problems. 


Boundary Value Problems 


They arise in systems of differential equations, say 

y[ = fi(x,y 1 ,y 2 ), 

2/2 = f2( x ,yny2), 

when there is no initial point x 0 at which yi(x 0 ) and y 2 (x 0 ) are known simulta¬ 
neously. Questions of existence and uniqueness then become much more compli¬ 
cated. 

Example 1. Consider the differential equation 

y" = exp (y) or y[=y 2 , 2/2= ex P (Vi) (15.2a) 

with the boundary conditions 

2/n. (0) = a, y 1 (l)=b. (15.2b) 

In order to apply our existence theorems or to do numerical computations (say by 
Euler’s method (7.3)), we can proceed as follows: guess the missing initial value 
y 20 . We can then compute the solution and check whether the computed value for 
2^(1) is equal to b or not. So our problem is, whether the function of the single 
variable y 20 

F(y 20 ) (15.3) 


possesses a zero or not. 

Equation (15.2a) is quasimonotone, which implies that F(y 20 ) depends mono- 
tonically on y 20 (Fig. 15.1a, see Exercise 7 of 1.10). Also, for y 20 very small or 
very large, y 1 (l) is arbitrarily small or large, or even infinite. Therefore, (15.2) 
possesses for all a, b a unique solution (see Fig. 15.1b). 
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Fig. 15.1. a) Solutions of (15.2a) for different initial values 2/20 = —1.7,, —0.4 
b) Unique solution of (15.2a) for a = 1, b = 2, 2/20 — —0.476984656 
c) Solutions of (15.4a) for 2/(0) = 1 and 2/20 = 0,1,2,..., 9 
d) The two solutions of (15.4a), y( 0) = 1 , y( 1) = 0.5, 2/20 = 7.93719, 2/20 = 0.97084 

The root of F(y 20 ) = 0 can computed by an iterative method, (bisection, 
regula falsi,...; if the derivative of y 1 (1) with respect to 2/20 * s use d f rom Theorem 
14.3 or numerically from finite differences, also by Newton’s method). The initial 
value problem is then computed several times. Small problems, such as the above 
example, can be done by a simple dialogue with the computer. Harder problems 
with more unknown initial values need more programming skills. This method is 
one of the most commonly used and is called the shooting method. 

Example 2. For the differential equation 

y" = - exp (y) or y[ = y 2 , y' 2 = - exp(y 1 ) (15.4a) 

with the boundary conditions 

2/i(0) = a, 2/i(l) = 6 (15.4b) 

the monotonicity of F(y 20 ) is lost and things become more complicated: solutions 
for different initial values y 20 are sketched for a = 1 in Fig. 15.1c. It can be seen 
that for b above a certain value (which is 1.499719998) there exists no solution of 
the problem at all, and for b below this value there exist two solutions (Fig. 15.Id). 

Example 3. 

y'\=Vii V 2 = yh 2/1 (0) = 1, 2 /]. ( 100 ) = 2 . (15.5) 

This equation is similar to (15.2) and the same statement of existence and unique¬ 
ness holds as above. However, if one tries to compute the solutions by the shoot¬ 
ing method, one gets into trouble because of the length of the interval: the so¬ 
lution nearly never exists on the whole interval; in fact, the correct solution is 
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7/20 = —0.70710616655. But already for y 20 = —0.7071061, y 1 (x) tends to +00 
for x —> 98.2. On the other side, for y 20 = —0.70711, we have 2^ (94.1) = — 00. 
So the domain where F(y 2 0 ) of (15.3) exists is of length less than 4 x 10 -6 . 

In a case like this, one can use the multiple shooting technique: the interval is 
split up into several subintervals, on each of which the problem is solved with well- 
chosen initial values. At the endpoints of the subintervals, the solutions are then 
matched together. Equation (15.3) thereby becomes a system of higher dimension 
to be solved. Another possibility is to apply global methods (finite differences, 
collocation). Instead of integrating a sequence of initial value problems, a global 
representation of the approximate solution is sought. There exists an extensive 
literature on methods for boundary value problems. As a general reference we give 
Ascher, Mattheij & Russel (1988) and Deuflhard (1980). 


Sturm-Liouville Eigenvalue Problems 


This subject originated with a remarkable paper of Sturm (Sturm 1836) in Liou- 
ville’s newly founded Journal. This paper was followed by a series of papers by 
Liouville and Sturm published in the following volumes. It is today considered as 
the starting point of the “geometric theory”, where the main effort is not to try to 
integrate the equation, but merely to obtain geometric properties of the solution, 
such as its form, oscillations, sign changes, zeros, existence of maxima or minima 
and so on, directly from the differential equation (“Or on peut arriver a ce but par 
la seule consideration des equations differentielles en elles-memes, sans qu’on ait 
besoin de leur integration”) 

The physical origin was, as in Section 1.6, the study of heat and small os¬ 
cillations of elastic media. Let us consider the heat equation with non-constant 
conductivity 

k{x)>0 ’ <15 - 6) 

which was studied extensively in Poisson’s “Theorie de la chaleur”. Poisson (1835) 
assumes u(x, t) = y(x)e~ xt , so that (15.6) becomes 

i{ kM ta)r e(x)y= - Xv - (15J) 

We write (15.7) in the form 


(k(x)y'Y + G(x)y = 0 

and state the following comparison theorem of Sturm: 


(15.8) 
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Theorem 15.1. Consider, with (15.8), the differential equation 

(k(x)y')' + G(x)y = 0, (15.9) 

and assume k, k differentiable, G, G continuous, 

0 < k(x) < k(x), G(x) > G(x) (15.10) 

for all x and let y(x), y(x) be linearly independent solutions of (15.8) and (15.9), 
respectively. Then, between any two zeros of y(x) there is at least one zero of y(x), 
i.e., if y(xf) = y(x 2 ) = 0 with x 1 < x 2 then there exists x 3 in the open interval 
(x 1 ,x 2 ) such that y(xf) = 0. 


Proof. The original proof of Sturm is based on the quotient 

/ \ y(x) 
q X k(x)y'(x) 

which is the slope of the line connecting the origin with the solution point in the 
(ky f , y) -plane and satisfies a first-order differential equation. In order to avoid the 
singularities caused by the zeros of y'(x) , we prefer the use of polar coordinates 
(Priifer 1926) 

k(x)y\x) = g(x) cos p(x), y(x) = g(x) sinp(x). (15.11) 


Differentiation of (15.11) yields the following differential equations for p and g: 




k(x) 


cos 2 p + G(x) sin 2 p 


g m . — G(x)^ • sin p • cos p • g. 


*k(x 


(15.12) 

(15.13) 


In the same way we also introduce functions g(x) and p(x) for the second dif¬ 
ferential equation (15.9). They satisfy analogous relations with k(x) and G(x) 
replaced by k(x) and G(x). 

Suppose now that x 1: x 2 are two consecutive zeros of y(x). Then p(x 1 ) and 
p(x 2 ) must be multiples of 7r, since g(x) is always different from zero (unique¬ 
ness of the initial value problem). By (15.12) p r (x) is positive at x 1 and at x 2 . 
Therefore we may assume that 


<p(Xi)=0, cp(x 2 )= 7 T, &( x l) G [0, 7 r). (15.14) 


The fact that equation (15.12) is first-order and the inequalities (15.10) allow the 
application of Theorem 10.3 to give 

p(x) > p(x) for x 1 <x<x 2 . 

It is impossible that p(x) = p(x) everywhere, since this would imply G(x) = 
G(x), cos p(x)/k(x) = cos p(x)/k(x) by (15.12) and (15.10). As a consequence 
of (15.13) we would have g(x) = C • g(x) and the solutions y(x ), y(x) would be 
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linearly dependent. Therefore, there exists x 0 E (x 1 ,x 2 ) such that <p(x 0 ) > p(x 0 ). 
In this situation (p(x) > ip(x) for all x > x 0 and the existence of x 3 E (x 1 ,x 2 ) with 
p(x 3 ) = 7r is assured. □ 


The next theorem shows that our eigenvalue problem possesses an infinity of 
solutions. We add to (15.7) the boundary conditions 


y(x o )=y(x 1 )=0. 


(15.15) 


Theorem 15.2. The eigenvalue problem (15.7), (15.15) possesses an infinite se¬ 
quence of eigenvalues < A 2 < A 3 < ... whose corresponding solutions yfix) 
(“eigenfunctions”) possess respectively 0, 1 , 2,. .. zeros in the interval (x Q , xf). 
The zeros of yj +1 (x) separate those of yj(x). If 0 < K 1 < k(x) < K 2 and 
L\ < £(x) < L 2 , then 


L\ +K 1 


j 2 7T 2 


( X 1 — x o) 


< A < L 2 + K 2 


j 2 7T 2 


2 - ~ ^2 


( Xl -X 0 ) 2 ' 


(15.16) 


Proof. Let y(x, A) be the solution of (15.7) with initial values y(x 0 ) = 0, y'(x 0 ) = 
1. Theorem 15.1 (with k(x) = k(x), G(x) = G(x) + AA) implies that for in¬ 
creasing A the zeros of y(x, A) move towards x 0 , so that the number of zeros in 
(x 0 , xf) is a non-decreasing function of A. 

Comparing next (15.7) with the solution (A > Lf) 

sin(y / (A-Li)//sT 1 • (x - x 0 )) 

of Kpy" + (A — Lf)y = 0 we see that for A < L 1 + K-^pn 2 /(x 1 — x 0 ) 2 , y(x, A) 
has at most j — 1 zeros in (x 0 , xf\. Similarly, a comparison with 

sin(^(A-L 2 )/^2 • (x - x 0 )) 

which is a solution of K 2 y" + (A — L 2 )y = 0, shows that y(x, A) possesses at 
least j zeros in (x 0 , xf) , if A > L 2 + K 2 j 2 7r 2 /(x 1 — x 0 ) 2 . The statements of the 
theorem are now simple consequences of these three properties. □ 


Example. Fig. 15.2 shows the first 5 solutions of the problem 

((1 - 0.8 sin 2 x) V y -(x- A )y = 0, 2/(0) = y( tt ) = 0. (15.17) 

The first eigenvalues are 2.1224, 3.6078, 6.0016, 9.3773, 13.7298, 19.053, 
25.347, 32.609, 40.841, 50.041, etc. 
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Fig. 15.2. Solutions of the Sturm-Liouville eigenvalue problem (15.17) 

For more details about this theory, which is a very important page of history, 
we refer to the book of Reid (1980). 


Exercises 


1. Consider the equation 

L{x)y" + M (x)y' + N(x)y = 0. 

Multiply it with a suitable function <p(x ), so that the ensuing equation is of the 
form (15.8) (Sturm 1836, p. 108). 


2. Prove that two solutions of (15.7), (15.15) satisfy the orthogonality relations 


f 


yAx)y k {x)dx = 0 


for 


y 7^ ■v- 


Hint. Multiply this by \-, replace \-y-{x) from (15.7) and do partial integra¬ 
tion (Liouville 1836, p. 257). 


3. Solve the problem (15.5) by elementary functions. Explain why the given value 
for y 2 o is so close to — V2/2. 

4. Show that the boundary value problem (see Collatz 1967) 

y" = -y\ 2/(0) = 0, y(A) = B (15.18) 

possesses infinitely many solutions for each pair (A, B ) with A ^ 0. 

Hint. Draw the solution y(pc) of (15.18) with y( 0) = 0, y'( 0) =g 1. Show that 
for each constant a, z{x) = ay {ax) is also a solution. 



1.16 Periodic Solutions, Limit Cycles, 
Strange Attractors 


2° Les demi-spirales que l’on suit sur un arc infini sans arriver a 
un noeud ou a un foyer et sans revenir au point de depart; ... 

(H. Poincare 1882, Oeuvres vol. 1, p. 54) 


The phenomenon of limit cycles was first described theoretically by Poincare 
(1882) and Bendixson (1901), and has since then found many applications in 
Physics, Chemistry and Biology. In higher dimensions things can become much 
more chaotic and attractors may look fairly “strange”. 


Van der Pol’s Equation 

I have a theory that whenever you want to get in trouble with a 
method, look for the Van der Pol equation. 

(PE. Zadunaisky 1982) 

The first practical examples were studied by Rayleigh (1883) and later by Van der 
Pol (1920-1926) in a series of papers on nonlinear oscillations: the solutions of 


y" + ay' + y = 0 


are damped for a > 0, and unstable for a < 0. The idea is to change a (with 
the help of a triode, for example) so that a < 0 for small y and a > 0 for large 
y . The simplest expression, which describes the physical situation in a somewhat 
idealized form, would be a = s(y 2 — 1) , e > 0. Then the above equation becomes 

y" + e(y 2 -l)y' + y = 0, (16.1) 


or, written as a system, 


2/2 = £ ( 1 -yi)v2-Vi* £ >°- 

In this equation, small oscillations are amplified and large oscillations are damped. 
We therefore expect the existence of a stable periodic solution to which all other 
solutions converge. We call this a limit cycle (Poincare 1882, “Chap. VI”). The 
original illustrations of the paper of Van der Pol are reproduced in Fig. 16.1. 
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Existence proof. The existence of limit cycles is studied by the method of Poincare 
sections (Poincare 1882, “Chap. V, Theorie des consequents”). The idea is to cut 
the solutions transversally by a hyperplane II and, for an initial value y 0 E II, to 
study the first point <f>(t/ 0 ) where the solution again crosses the plane II in the 
same direction. 

For our example (16.2), we choose for II the half-line y 2 = 0, y x > 0. We 
then examine the signs of y[ and y' 2 in (16.2). The sign of y 2 changes at the curve 


V 2 = 


V\ 

e(i-y\IY 


(16.3) 


which is drawn as a broken line in Fig. 16.2. It follows (see Fig. 16.2) that <h(t/ 0 ) 
exists for all y 0 E II. Since two different solutions cannot intersect (due to unique¬ 
ness), the map <f> is monotone. Further, <f> is bounded (e.g., by every solution 
starting on the curve (16.3)), so &(y 0 ) < y 0 for y 0 large. Finally, since the origin 
is unstable, &(y 0 ) > y 0 for y 0 small. Hence there must be a fixed point of &(y 0 ), 
i.e., a limit cycle. □ 


The limit cycle is, in fact, unique. The proof for this is more complicated and 
is indicated in Exercise 8 below (Lienard 1928). 
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Fig. 16.2. The Poincare map for Van der Pol’s equation, e = 1 

With similar ideas one proves the following general result: 

Theorem 16.1 (Poincare 1882, Bendixson 1901). Each bounded solution of a two- 
dimensional system 

y'i = A(2/i>2/2). ( 16 - 4 ) 

must 

i) tend to a critical point f 1 = f 2 = 0 for an infinity of points x i —» oc; or 

ii) be periodic; or 

iii) tend to a limit cycle. □ 


Remark. Exercise 1 below explains why the possibility (i) is written in a form 
somewhat more complicated than seems necessary. 


Steady-state approximations for e large. An important tool for simplifying com¬ 
plicated nonlinear systems is that of steady-state approximations. Consider (16.2) 
with e very large. Then, in the neighbourhood of f 2 (y 1^2) = 0 f° r \Vi\ > 1> 
the derivative of y r 2 = f 2 with respect to y 2 is very large negative. Therefore the 
solution will very rapidly approach an equilibrium state in the neighbourhood of 
y 2 = / 2 (^i, 1/2) = be., i n our example, y 2 = y 1 /(s(l — yf)) . This can be in¬ 
serted into (16.2) and leads to 


y[ = 


y\ 


e(l -VIY 


(16.5) 


an equation of lower dimension. Using the formulas of Section 1.3, (16.5) is easily 
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solved to give 

log (2/1) - — = —+ Const - 

These curves are dotted in Van der Pol’s Fig. 16.3 for e = 10 and show the good 
approximation of this solution. 



Fig. 16.3. Solution of Van der Pol’s equation for e — 10 
compared with steady state approximations 


Asymptotic solutions for e small. The computation of periodic solutions for 
small parameters was initiated by astronomers such as Newcomb and Lindstedt 
and brought to perfection by Poincare (1893). We demonstrate the method for the 
Van der Pol equation (16.1). The idea is to develop the solution as a series in powers 
of e . Since the period will change too, we also introduce a coordinate change 


t m x(l + + 7 2 £ 2 + ...) (16.6) 

and put 

y( x) = z(t) = z 0 (t)+ez 1 (t)+e 2 z 2 (t) + .... (16.7) 

Inserting now y'(x) = z'(t)( l+ 7 1 e:+ ...), y"(x) = z"(t)( 1+7 1 £ + ...) 2 into 
(16.1) we obtain 

{ z o + sz” + £ 2 z f 2 +•••)(! + ^i £ + ( 2 7 2 + 7i ) £<2 + . • •) 

+ e((z 0 + ez 1 + .. .) 2 — l) (zq + ez[ + •••)(!+ 71 s + • • •) (16.8) 

+ (^q + SZ^ + £ 2 Z 2 + • • •) — 0 . 

We first compare the coefficients of s° and obtain 

z'q+z 0 = 0. (16.8;0) 


We fix the initial value on the Poincare section P, i.e., z'( 0) = 0, so that z 0 = 
A cos t with A , for the moment, a free parameter. Next, the coefficients of £ yield 

z" + z x = - {zl - 1 )z' 0 

/A 3 \ A 3 

= 27 X A cos t + ^—- Aj sin t + — sin 3 1. 


(16.8; 1 ) 
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Here, the crucial idea is that we are looking for periodic solutions, hence the terms 
in cos t and sin t on the right-hand side of (16.8; 1) must disappear, in order to 
avoid that z 1 (t) contain terms of the form t • cos t and t-smt (“ ... et de faire 
disparaitre ainsi les termes dits seculaires We thus obtain = 0 and A = 2 . 
Then (16.8; 1) can be solved and gives, together with z[( 0) = 0, 

3 1 

z 1 =Bcost+-smt—-sm3t. (16.9) 

The continuation of this process is now clear: the terms in e 2 in (16.8) lead to, after 
insertion of (16.9) and simplification, 

z 2 + z 2 = ( 4 7 2 + ^) cos t ‘IB sin t T - 3B sin 3t — ^ cos3f+^ cos5L (16.8;2) 


Secular terms are avoided if we set B = 0 and 7 2 = —1/16. Then 

3 5 

= C cos t-\ -cos 3 1 -cos 5 1. 

z 16 96 

The next round will give C —— 1/8 and 7 3 = 0, so that we have: the periodic orbit 
of the Van der Pol equation (16.1) for e small is given by 


y(x)=z(t), t = x{l — £ 2 /16 + ...)} 

z(t) = 2 cos t + s (^ sin t — ^ sin 3f) 

o ( 1 3 5 \ 

+ e — cos t -(-cos 3 1 -cos ot ) + ... 

V 8 16 96 / 

and is of period 27t(1 + s 2 /16 + ...). 


(16.10) 


Chemical Reactions 

The laws of chemical kinetics give rise to differential equations which, for multi- 
molecular reactions, become nonlinear and have interesting properties. Some of 
them possess periodic solutions (e.g. the Zhabotinski-Belousov reaction) and have 
important applications to the interpretation of biological phenomena (e.g. Pri- 
gogine, Lefever). 

Let us examine in detail the model of Lefever and Nicolis (1971), the so-called 
“Brusselator”: suppose that six substances A,B,D,E,X,Y undergo the follow¬ 
ing reactions: 

X 

Y + D (bimolecular reaction) , 11 . 

(16.11) 

3X (autocatalytic trimol. reaction) 

E 


A 

B + X 
2X + Y 
X 


ki 


k 2 


ks 


k4 
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If we denote by A(x), B(x), ... the concentrations of A,B, ... as functions of 
the time x 9 the reactions (16.11) become by the mass action law the following 
differential equations 

A! = -k x A 
B' = —k 2 BX 
D' = k 2 BX 
E' = k 4 X 

X' = k x A - k 2 BX + k 3 X 2 Y - k A X 
Y' = k 2 BX - k 3 X 2 Y. 


This system is now simplified as follows: the equations for D and E are left out, 
because they do not influence the others; A and B are supposed to be maintained 
constant (positive) and all reaction rates k i are set equal to 1. We further set 

y 1 (x) :=X(x) 9 y 2 {x) :=Y(pc) and obtain 


y[=A + yly 2 -(B + l)y 1 
V 2 = By l -y\y 2 . 


(16.12) 


The resulting system has one critical point y[ = y' 2 = 0 at y 1 = A, y 2 = B/A. 
The linearized equation in the neighbourhood of this point is unstable iff B > 
A 2 + 1. Further, a study of the domains where y[ 9 y 2 , or (y 1 +y 2 )' is positive or 
negative leads to the result that all solutions remain bounded. Thus, for B > A 2 +1 
there must be a limit cycle which, by numerical calculations, is seen to be unique 
(Fig. 16.4). 



Fig. 16.4. Solutions of the Brusselator, A = 1 , B = 3 

An interesting phenomenon (Hopf bifurcation, see below) occurs, when B ap¬ 
proaches A 2 + 1. Then the limit cycle becomes smaller and smaller and finally 
disappears in the critical point. Another example of this type is given in Exercise 2. 
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Limit Cycles in Higher Dimensions, Hopf Bifurcation 


The Theorem of Poincare-Bendixson is apparently true only in two dimensions. 
Higher dimensional counter-examples are given by nearly every mechanical move¬ 
ment without friction, as for example the spherical pendulum (6.20), see Fig. 6.2. 
Therefore, in higher dimensions limit cycles are usually found by numerical studies 
of the Poincare section map <f> defined above. 

There is, however, one situation where limit cycles occur quite naturally (Hopf 
1942): namely when at a critical point of y' = f(y,a), all eigenvalues 

of (df /dy)(y 0 , a) have strictly negative real part with the exception of one pair 
which, by varying a , crosses the imaginary axis. The eigenspace of the stable 
eigenvalues then continues into an analytic two dimensional manifold, inside which 
a limit cycle appears. This phenomenon is called “Hopf bifurcation”. The proof of 
this fact is similar to Poincare’s parameter expansion method (16.7) (see Exercises 
6 and 7 below), so that Hopf even hesitated to publish it (“... ich glaube kaum, 
dass an dem obigen Satz etwas wesentlich Neues ist ...”). 

As an example, we consider the “full Brusselator” (16.11): we no longer sup¬ 
pose that B is kept constant, but that B is constantly added to the mixture with 
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rate a. When we set y 3 (x) := B(x ), we obtain instead of (16.12) (with A = 1) 

y[ = i + yi2/ 2 -(% + 1 )yi 

2/ 2 = 2/i2/ 3 “ 2/i2/ 2 (16.13) 

2/3 = - 2 /i% + «• 


This system possesses a critical point at y 1 



= 1, y 2 = y 3 = a with derivative 
1 

-1 1 . (16.14) 

0 -l) 


This matrix has A 3 + (3 — a:) A 2 + (3 — 2a) A + 1 as characteristic polynomial and 
satisfies the condition for stability iff a < (9 — \/l7)/4 = 1.21922 (see 1.13, Ex¬ 
ercise 1). Thus when a increases beyond this value, there arises a limit cycle 
which exists for all values of a up to approximately 1.5 (see Fig. 16.5). When a 
continues to grow, the limit cycle “explodes” and y 1 —> 0 while y 2 and y 3 —> oc. 
So the system (16.13) has a behaviour completely different from the simplified 
model (16.12). 

A famous chemical reaction with a limit cycle in three dimensions is the “Oreg- 
onator” reaction between HBr0 2 , Br~ , and Ce (IV) (Field & Noyes 1974) 

2 A = 77.27 (y 2 + Vl (l- 8.375 x 1(TV - y 2 )) 

2/2 = ^-^( 2/3 - ( 1 + 2 /i) 2 / 2 ) (16.15) 

2/3 = 0 - 161 ( 2 /!-%) 

whose solutions are plotted in Fig. 16.6. This is an example of a “stiff” differential 
equation whose solutions change rapidly over many orders of magnitude. It is thus 
a challenging example for numerical codes and we shall meet it again in Volume II 
of our book. 

Our next example is taken from the theory of superconducting Josephson junc¬ 
tions, coupled together by a mutual capacitance. Omitting all physical details, (see 
Giovannini, Weiss & Ulrich 1978), we state the resulting equations as 


c( y" ^oty'2 ) = h - sin^i) - y[ 

<3(2/2 - a Vi ) = *2 - sin ( 2 / 2 ) - 2/2- 


(16.16) 


Here, y x and y 2 are angles (the “quantum phase difference across the junction”) 
which are thus identified modulo 27 t. Equation (16.16) is thus a system on the 
torus T 2 for (y 1: y 2 ), and on M 2 for the voltages (y^y^) • It is seen by numerical 
computations that the system (16.16) possesses an attracting limit cycle, which 
describes the phenomenon of “phase locking” (see Fig. 16.7). 
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Strange Attractors 


“Mr. Dahlquist, when is the spring coming ?” 

“Tomorrow, at two o’clock.” 

(Weather forecast, Stockholm 1955) 

“We were so naive ...” 

(H.O. Kreiss, Stockholm 1985) 

Concerning the discovery of the famous “Lorenz model”, we best quote from 
Lorenz (1979): 

“By the middle 1950’s “numerical weather prediction”, i.e., forecasting by 
numerically integrating such approximations to the atmospheric equations as could 
feasibly be handled, was very much in vogue, despite the rather mediocre results 
which it was then yielding. A smaller but determined group favored statistical 
prediction (...) apparently because of a misinterpretation of a paper by Wiener 
(...). I was skeptical, and decided to test the idea by applying the statistical method 
to a set of artificial data, generated by solving a system of equations numerically 
(...). The first task was to find a suitable system of equations to solve (...). 
The system would have to be simple enough (... and) the general solution would 
have to be aperiodic, since the statistical prediction of a periodic series would be 
a trivial matter, once the periodicity had been detected. It was not obvious that 
these conditions could be met. (...) The break came when I was visiting Dr. Barry 
Saltzman, now at Yale University. In the course of our talks he showed me some 
work on thermal convection, in which he used a system of seven ordinary differ¬ 
ential equations. Most of his numerical solutions soon acquired periodic behavior, 
but one solution refused to settle down. Moreover, in this solution four of the 
variables appeared to approach zero. Presumably the equations governing the re¬ 
maining three variables, with the terms containing the four variables eliminated, 
would also possess aperiodic solutions. Upon my return I put the three equations 
on our computer, and confirmed the aperiodicity which Saltzman had noted. We 
were finally in business.” 

In a changed notation, the three equations with aperiodic solutions are 

y'i=- a Vl + a V 2 

y' 2 = -yi% + n/i -y 2 (16.17) 

% = viV2 — by?, 

where cr, r and b are positive constants. It follows from (16.17) that 

5s( 9?+! ' |+(! '^^ r)2 ) 


(16.18) 
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Therefore the ball 

= {(2/i>2/2>2/3) I yi+yl + (yz-<^-r) 2 <C 2 } (16.19) 

is mapped by the flow (p 1 (see (14.22)) into itself, provided that c is sufficiently 
large so that R 0 wholly contains the ellipsoid defined by equating the right side 
of (16.18) to zero. Hence, if x assumes the increasing values 1, 2, 3,..., R 0 is 
carried into regions R 1 = ip 1 (R 0 ), R 2 = ip 2 (R 0 ) etc., which satisfy R 0 D R 1 D 
R 2 D R 3 D .. . (applying (p 1 to the inclusion R 0 D R x gives R x D R 2 and so on). 

Since the trace of df/dy for the system (16.17) is the negative constant 
— (<r + b + 1), the volumes of R k tend exponentially to zero (see Theorem 14.8). 
Every orbit is thus ultimately trapped in a set R^ = R 0 H R 1 H R 2 ... of zero 
volume. 

System (16.17) possesses an obvious critical point y 1 = y 2 = y 3 = 0; this be¬ 
comes unstable when r > 1. In this case there are two additional critical points C 
and C' respectively given by 

Vi =?/2 = ± \/fr( r - 1 )> y 3 =r-l. (16.20) 


These become unstable (e.g. by the Routh criterion, Exercise 1 of Section 1.13) 
when a > b + 1 and 

cr(cr + 6 + 3) 


r > r = 

~ C (7-6-1 


(16.21) 


In the first example we shall use Saltzman’s values 6 = 8/3, cr = 10, and 
r — 28. (“Here we note another lucky break: Saltzman used a = 10 as a crude 
approximation to the Prandtl number (about 6) for water. Had he chosen to study 
air, he would probably have let a = 1, and the aperiodicity would not have been 
discovered”, Lorenz 1979). In Fig. 16.8 we have plotted the solution curve of 
(16.17) with the initial value y x — — 8, y 2 = 8, y 3 =r — 1, which, indeed, looks 
pretty chaotic. 

For a clearer understanding of the phenomenon, we choose the plane 
y 3 — r — 1, especially the square region between the critical points C and C' , as 
Poincare section n. The critical point yi=y 2 = V 3 = Q possesses (since r > 1) one 
unstable eigenvalue X 1 = (—1 — cr-f y/(l — a) 2 + 4rcr)/2 and two stable eigenval¬ 
ues A 2 = — 6, A 3 = (—1 — a — y/(l — <t) 2 + 4rcr)/2. The eigenspace of the stable 
eigenvalues continues into a two-dimensional manifold of initial values, whose so¬ 
lutions tend to 0 for x —> oc. This “stable manifold” cuts n in a curve E (see 
Fig. 16.9). The one-dimensional unstable manifold (created by the unstable eigen¬ 
value A x ) cuts n in the points D and D' (Fig. 16.9). 

All solutions starting in U u above E (the dark cat) surround the above criti¬ 
cal point C and are, at the first return, mapped to a narrow stripe S u , while the 
solutions starting in U d below E surround C’ and go to the left stripe S d . At 
the second return, the two stripes are mapped into two very narrow stripes inside 
S u and S d . After the third return, we have 8 stripes closer and closer together, 
and so on. The intersection of all these stripes is a Cantor-like set and, continued 
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The Ups and Downs of the Lorenz Model 

“Mr. Laurel and Mr. Hardy have many ups and downs — Mr. Hardy 
takes charge of the upping, and Mr. Laurel does most of the downing 
— ” (from “Another Fine Mess”, Hal Roach 1930) 



If one watches the solution y^x) of the Lorenz equation being calculated, one 
wonders who decides for the solution to go up or down in an apparently unpre¬ 
dictable fashion. Fig. 16.9 shows that E cuts both stripes S d and S u . Therefore 
the inverse image of E (see Fig. 16.10) consists of two lines E 0 and E x which 
cut, together with E, the plane n into four sets U uu , U ud , U du , U dd . If the 
initial value is in one of these, the corresponding solution goes up-up, up-down, 
down-up, down-down. Further, the inverse images of E 0 and E 1 lead to four lines 
E 00 , E 01 , E 10 , E n . The plane n is then cut into 8 stripes and we now know 
the fate of the first three ups and downs. The more inverse images of these curves 
we compute, the finer the plane n is cut into stripes and all the future ups and 
downs are coded in the position of the initial value with respect to these stripes 
(see Fig. 16.10). It appears that a very small change in the initial value gives rise, 
after a couple of rotations, to a totally different solution curve. This phenomenon, 
discovered merely by accident by Lorenz (see Lorenz 1979), is highly interesting 



Fig. 16.10. Stripes deciding for the ups and downs 
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and explains why the theorem of uniqueness (Theorem 7.4), of whose philosophical 
consequences Laplace was so proud, has its practical limits. 

Remark. It appears in Fig. 16.10 that not all stripes have the same width. The 
sequences of “n”’s and “d”’s which repeat u or d a couple of times (but not 
too often) are more probable than the others. More than 25 consecutive “ups” 
or “downs” are (for the chosen constants and except for the initial phase) never 
possible. This has to do with the position of D and D ', the outermost frontiers of 
the attractor, in the stripes of Fig. 16.10. 


Feigenbaum Cascades 

However nicely the beginning of Lorenz’ (1979) paper is written, the affirmations 
of his last section are only partly true. As Lorenz did, we now vary the parameter 
b in (16.17), letting at the same time r = r c (see (16.21)) and 

<7 = b+l + y/2(b + l)(b + 2). (16.22) 

This is the value of a for which r c is minimized. Numerical integration shows 
that for b very small (say b < 0.139), the solutions of (16.17) evidently converge 
to a stable limit cycle, which cuts the Poincare section y 3 = r — 1 twice at two 
different locations and surrounds both critical points C and C’ . Further, for b 
large (for example b = 8/3) the coefficients are not far from those studied above 
and we have a strange attractor. But what happens in between? We have computed 
the solutions of the Lorenz model (16.17) for b varying from 0.1385 to 0.1475 
with 1530 intermediate values. For each of these values, we have computed 1500 
Poincare cuts and represented in Fig. 16.11 the y x -values of the intersections with 
the Poincare plane y 3 = r — 1. After each change of b , the first 300 iterations were 
not drawn so that only the attractor becomes visible. 

For b small, there is one periodic orbit; then, at b = b x = 0.13972, it suddenly 
splits into an orbit of period two, this then splits for b = b 2 = 0.14327 into an orbit 
of period four, then for b = b 3 = 0.14400 into period eight, etc. There is a point 
b^ = 0.14422 after which the movement becomes chaotic. Beyond this value, 
however, there are again and again intervals of stable attractors of periods 5, 3, etc. 
The whole picture resembles what is obtained by the recursion 

X n+1 =a(x n -x 2 J (16.23) 

which is discussed in many papers (e.g. May 1976, Feigenbaum 1978, Collet & 
Eckmann 1980). 

But where does this resemblance come from? We study in Fig. 16.12 the 
Poincare map for the system (16.17) with b chosen as 0.146 of a region —0.095 < 
y x < —0.078 and —0.087 < y 2 < —0.07. After one return, this region is com¬ 
pressed to a thin line somewhere else on the plane (Fig. 16.12b), the second return 
bends this line to U -shape and maps it into the original region (Fig. 16.12c). 
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Fig. 16.11. Poincare cuts y\ for (16.17) as function of b 



Fig. 16.12. Poincare map for system (16.17) with b = 0.146 


Therefore, the Poincare map is essentially a map of the interval [0,1] to itself 
similar to (16.23). It is a great discovery of Feigenbaum that for all maps of a 
similar shape, the phenomena are always the same, in particular that 

lim = 4.6692016091029906715... 

Z ^°° &z+l — &i 

is a universal constant, the Feigenbaum number. The repeated doublings of the 
periods at b x , b 2 , b 3 ,... are called Feigenbaum cascades. 
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Exercises 


1. The Van der Pol equation (16.2) with e = 1 possesses a limit cycle of pe¬ 
riod T = 6.6632868593231301896996820305 passing through y 2 = 0, y 4 =A 
where A = 2.00861986087484313650940188. Replace (16.2) by 

y[ =y 2 (^-2/i) 

2/2= ((! — — 27i) I- 4 — 2/i) 

so that the limit cycle receives a stationary point. Study the behaviour of a 
solution starting in the interior, e.g. at y 10 = 1, y 20 = 0. 

2. (Frommer 1934). Consider the system 

y [ = —y 2 + 2 j/i2/2 — y\i y '2 = yi + ( l + £ )yi+‘Zy 1 y 2 -yl- (16.24) 

Show, either by a stability analysis similar to Exercise 5 of Section 1.13 or 
by numerical computations, that for e > 0 (16.24) possesses a limit cycle of 
asymptotic radius r = yJ§e/7 . (See also Wanner (1983), p. 15 and 1.13, Exer¬ 
cise 5). 

3. Solve Hilbert’s 16th Problem: what is the highest possible number of limit 
cycles that a quadratic system 

Vl = a 0~^~ a lVl + a 2V2 + ^3^1 + a AVlV2 + a hV2 

y' 2 =/3 0 + p iVl + (3 2 y 2 + / %y{ + f3 4 y x y 2 + (3 5 y% 

can have? The mathematical community is waiting for you: nobody has been 
able to solve this problem for more than 80 years. At the moment, the highest 
known number is 4, as for example in the system 

y'l = Aj/i - y 2 - 10 y\ + (5 + S)y 4 y 2 + y\ 
y '2 — y 1 + 2/i + (—25 + 8e — 95)y 1 y 2 , 

S = —10 -13 , £ = -10“ 52 , A = —10 _20 ° 

(see Shi Songling 1980, Wanner 1983, Perko 1984). 

4. Find a change of coordinates such that the equation 

my" + {-A + B(y') 2 )y' + ky = 0 

becomes the Van der Pol equation (16.2) (see Kryloff & Bogoliuboff (1947), 
p. 5). 

5. Treat the pendulum equation 

3 5 

y" + siny = y" + y-y + y^±... = 0, y( 0)=e, y'( 0) = 0, 
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by the method of asymptotic expansions (16.6) and (16.7) and study the period 
as a function of e. 

Result. The period is 27 t(1 + e 1 /16 + ...). 

6. Compute the limit cycle (Hopf bifurcation) for 

y" + y^e 2 y'-WY 

for e small by the method of Poincare (16.6), (16.7) with 2/(0) = 0. 

7. Treat in a similar way as in Exercise 6 the Brusselator (16.12) with A = 1 and 
B = 2 + e 2 . 

Hint. With the new variable y = y 1 + y 2 — 3 the differential equation (16.12) 
becomes equivalent to y' = 1 — y x and 

y" + y= -e 2 {y' -1) - (: y') 2 (y + y') + 2 yy' ■ 

Result. z(t ) = s(2/s/3) cost + ..., t = x(l — e 2 /18 + ...), so that the period 
is asymptotically 27 t(1 + s 2 /18 + ...). 

8. (Lienard 1928). Prove that the limit cycle of the Van der Pol equation (16.1) is 
unique for every e > 0. 

Hint. The identity 

y" + e(y 2 -l)y' = ^(V + £ (y ~v)) 

suggests the use of the coordinate system y±{x) =y(x), y 2 (%) =y f + z(y 3 / 3 — 
y). Write the resulting first order system, study the signs of y [, y 2 and the 
increase of the “energy” function V(x) = (yf + y 2 )/^- 

Also generalize the result to equations of the form y" + f(y)y' + g(y)= 0. For 
more details see e.g. Simmons (1972), p. 349. 

9. (Rayleigh 1883). Compute the periodic solution of 

y" + Ky' + A (y 1 ) 3 + n 2 y = 0 

for k and A small. 

Result, y = A sin(nx) + (AnA 3 /32) cos(3 nx) + ... where A is given by k + 
(3/4) An 2 A 2 = 0. 

10. (Bendixson 1901). If in a certain region Q of the plane the expression 

dA.df, 
dy 1 dy 2 

is always negative or always positive, then the system (16.4) cannot have closed 
solutions in Q. 
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Hint. Apply Green’s formula 

/ / (isr + i£) d,/ids ' 2= /( A * 2 “ /j dyi ) - 



Chapter II. Runge-Kutta 
and Extrapolation Methods 


Numerical methods for ordinary differential equations fall naturally into two 
classes: those which use one starting value at each step (“one-step methods”) and 
those which are based on several values of the solution (“multistep methods” or 
“multi-value methods”). The present chapter is devoted to the study of one-step 
methods, while multistep methods are the subject of Chapter III. Both chapters 
can, to a large extent, be read independently of each other. 

We start with the theory of Runge-Kutta methods: the derivation of order con¬ 
ditions with the help of labelled trees, error estimates, convergence proofs, imple¬ 
mentation, methods of higher order, dense output. Section II.7 introduces implicit 
Runge-Kutta methods. More attention will be drawn to these methods in Volume II 
on stiff differential equations. Two sections then discuss the elegant idea of extrap¬ 
olation (Richardson, Romberg, etc) and its use in obtaining high order codes. The 
methods presented are then tested and compared on a series of problems. The po¬ 
tential of parallelism is discussed in a separate section. We then turn our attention 
to an algebraic theory of the composition of methods. This will be the basis for 
the study of order properties for many general classes of methods in the follow¬ 
ing chapter. The chapter ends with special methods for second order differential 
equations y" = f(x,y), for Hamiltonian systems (symplectic methods) and for 
problems with delay. 

We illustrate the methods of this chapter with an example from Astronomy, the 
restricted three body problem. One considers two bodies of masses 1 — p and p in 
circular rotation in a plane and a third body of negligible mass moving around in 
the same plane. The equations are (see e.g., the classical textbook Szebehely 1967) 


„ . o / ,Vi + M Vi - 

yi=Vi + 2 V2 - m —- m ■ 


D, 


D o 


n r> / / y 2 y 2 

y 2 =y 2 - 2y 1 -n — - // — 


1 ^2 
2/2 2/2 


D i 


D 1 = ((2/! + M) 2 + y\f’\ D 2 = (( Vl - y'f + y\fl\ 
y, = 0 . 012277471 , y! = 1 - y . 


( 0 . 1 ) 
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There exist initial values, for example 

2/i(0) = 0.994, y[( 0) = 0, % (0) = 0 , 

y' 2 { 0) = -2.00158510637908252240537862224, (0.2) 

x end = 17.0652165601579625588917206249, 

such that the solution is periodic with period x end . Such periodic solutions have 
fascinated astronomers and mathematicians for many decades (Poincare; extensive 
numerical calculations are due to Sir George Darwin (1898)) and are now often 
called “Arenstorf orbits” (see Arenstorf (1963) who did numerical computations 
“on high speed electronic computers”). The problem is C°° with the exception of 
the two singular points y 1 = —y and y 1 = l — y, y 2 = 0, therefore the Euler poly¬ 
gons of Section 1.7 are known to converge to the solution. But are they really nu¬ 
merically useful here? We have chosen 24000 steps of step length h = x end /24000 
and plotted the result in Figure 0.1. The result is not very striking. 



Fig. 0.1. An Arenstorf orbit computed by equidistant Euler, 
equidistant Runge-Kutta and variable step size Dormand & Prince 


The performance of the Runge-Kutta method (left tableau of Table 1.2) is al¬ 
ready much better and converges faster to the solution. We have used 6000 steps of 
step size £ end /6000, so that the numerical work becomes equivalent. Clearly, most 
accuracy is lost in those parts of the orbit which are close to a singularity. There¬ 
fore, codes with automatic step size selection, described in Section II.4, perform 
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much better and the code DOPRI5 (Table 5.2) computes the orbit with a precision 
of 10 -3 in 98 steps (74 accepted and 24 rejected). The step size becomes very 
large in some regions and the graphical representation as polygons connecting the 
solution points becomes unsatisfactory. The solid line is the interpolatory solution 
(Section II.6), which is also precise for all intermediate values and useful for many 
other questions such as delay differential equations, event location or discontinu¬ 
ities in the differential equation. 

For still higher precision one needs methods of higher order. For example, 
the code DOP853 (Section II.5) computes the orbit faster than DOPRI5 for more 
stringent tolerances, say smaller than about 10 -6 . The highest possible order 
is obtained by extrapolation methods (Section II.9) and the code ODEX (with 
Kmax = 15) obtains the orbit with a precision of 10 -30 with about 25000 function 
evaluations, precisely the same amount of work as for the above Euler solution. 
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Die numerische Berechnung irgend einer Losung einer gegebenen 
Differentialgleichung, deren analytische Losung man nicht kennt, 
hat, wie es scheint, die Aufmerksamkeit der Mathematiker bisher 
wenig in Anspruch genommen ... (C. Runge 1895) 


The Euler method for solving the initial value problem 

y' = f(x,y), y(x 0 )=y 0 (1.1) 

was described by Euler (1768) in his “Institutiones Calculi Integralis” (Sectio Se- 
cunda, Caput VII). The method is easy to understand and to implement. We have 
studied its convergence extensively in Section 1.7 and have seen that the global er¬ 
ror behaves like Ch , where C is a constant depending on the problem and h is 
the maximal step size. If one wants a precision of, say, 6 decimals, one would thus 
need about a million steps, which is not very satisfactory. On the other hand, one 
knows since the time of Newton that much more accurate methods can be found, if 
/ in (1.1) is independent of y , i.e., if we have a quadrature problem 



y' = f(x), y(x 0 )=y 0 

an 

with solution 

y(X) = y 0 + [ f(x)dx. 

J x 0 

As an example consider the midpoint rule (or first Gauss formula) 

(1.2) 

y Oo + K) ‘ 

. . / h r, \ 

Vo 1 h of{ x o ' yj 


y(x 1 + h 1 ) ! 

-y 2 = yi + h if( x i + y) 

(1.3’) 

V{X): 

= y n _ 1 + h n _ 1 f(x n _ 1 + h ^-), 


where h i = x i+1 — x i and 

£ 0 , x lf ..., x n _ 1 ,x n = X is a subdivision of the in- 


tegration interval. Its global errror y(X) —Y is known to be bounded by Ch 2 . 
Thus for a desired precision of 6 decimals, a thousand steps will usually do, i.e., 
the method here is a thousand times faster. Therefore Runge (1895) asked whether 
it would also be possible to extend method (1.3’) to problem (1.1). The first step 
with h = h Q would read 

y(x 0 + h) « y 0 + hf (x 0 + y(x 0 + ^)), 


( 1 . 3 ) 
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but which value should we take for y(x 0 + h/2) ? In the absence of something 
better, it is natural to use one small Euler step with step size h/2 and obtain from 
(1.3) 1 

k 1 = f(x 0 ,y 0 ) 

k 2 = f(x 0 + ^,y 0 + ^k^ (1.4) 

2/i =y 0 + hk 2- 

One might of course be surprised that we propose an Euler step for the computation 
of k 2 , just half a page after preaching its inefficiency. The crucial point is, however, 
that k 2 is multiplied by h in the third expression and therefore its error becomes 
less important. To be more precise, we compute the Taylor expansion of y 1 in (1.4) 
as a function of h , 

Vi = y 0 + h f (x 0 + Vo + \fo) 

= y 0 + hf(x 0 ,y 0 ) + Y (f x + fyf)(x Q ,y 0 ) (1.5) 

/ \ 

+ -g" \fxx + Zfxyf + fyyf 2 J (X Q , V 0 ) + - 

This can be compared with the Taylor series of the exact solution, which is obtained 
from (1.1) by repeated differentiation and replacing y' by / every time it appears 
(Euler (1768), Problema 86, §656, see also (8.12) of Chap. I) 

/ \ 

y(x 0 + h) = y 0 + hf(x 0 , y 0 ) + — \J X + f y f)(x 0 , y 0 J (1.6) 

h 3 / \ 

+ -g- (fxx +2f xy f + fyyf 2 + fyfx + fyfj^OlVo) +- 

Subtracting these two equations, we obtain for the error of the first step 

y( x 0 + “ Vl = 24 (fxx + 2 fxyf + fyyf 2 + ^(fyfx + fy /)) ( x 0^o) +- 

(1.7) 

When all second partial derivatives of / are bounded, we thus obtain 

Il2/0o + /l ) — 2/i II <Kh 3 . 

In order to obtain an approximation of the solution of (1.1) at the endpoint 
X , we apply formula (1.4) successively to the intervals (a; 0 ,x 1 ), (x 1 ,x 2 ), ..., 
(x n _ 1 ,X), very similarly to the application of Euler’s method in Section 1.7. 
Again similarly to the convergence proof of Section 1.7, it will be shown in Section 
II.3 that, as in the case (1.1’), the error of the numerical solution is bounded by 
Ch 2 (h the maximal step size). Method (1.4) is thus an improvement on the Euler 
method. For high precision computations we need to find still better methods; this 
will be the main task of what follows. 

1 The analogous extension of the trapezoidal rule has been given in an early publication 
by Coriolis in 1837; see Chapter II.4.2 of the thesis of D. Tournes, Paris VII, 1996. 
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General Formulation of Runge-Kutta Methods 

Runge (1895) and Heun (1900) constructed methods by including additional Euler 
steps in (1.4). It was Kutta (1901) who then formulated the general scheme of what 
is now called a Runge-Kutta method: 

Definition 1.1. Let s be an integer (the “number of stages”) and a 21 , a 31 , a 32 ,..., 
a sl , a s2 , • • •, a 3 s _ l5 • • •, b 3 , c 2 ,.. * r c s be real coefficients. Then the method 

k i = f(x 0 ’Vo) 

k 2 = f( x 0 + c 2 ft > VO + ^«21 fc l) 

k 3 = f ( X 0 + C 3^> Vo + h i a 31 k l + a 32 k 2)) ,, 

( 1 . 8 ) 

k s = f ( X 0 + C s h - Vo + h ( a sl k l + • ' ' + a s,s-lK- 1)) 
yi = y 0 + h ( b i k i + ■ ■ - + b s k s) 

is called an s -stage explicit Runge-Kutta method (ERK) for (1.1). 

Usually, the c i satisfy the conditions 

c 2 = a 2 i, c 3 = a 31 +a 32 , ... c s ^ a sl +.. ,H- a ss _ 1: (1.9) 

or briefly, 

i— 1 

c i = E°o- (L9,) 

i=i 

These conditions, already assumed by Kutta, express that all points where / is 
evaluated are first order approximations to the solution. They greatly simplify the 
derivation of order conditions for high order methods. For low orders, however, 
these assumptions are not necessary (see Exercise 6). 

Definition 1.2. A Runge-Kutta method (1.8) has order p if for sufficiently smooth 
problems (1.1), 

\\y(x 0 + h)-y 1 \\<Kh p+1 , (1.10) 

i.e., if the Taylor series for the exact solution y(x 0 + h) and for y 1 coincide up to 
(and including) the term h p . 

With the paper of Butcher (1964b) it became customary to symbolize method 
(1.8) by the tableau (1.8’). 
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0 




c 2 

a 21 



C 3 

a 31 

a 32 


C S 

a sl 

a s2 

a s,s -1 


b i 

b 2 

b s-l b s 


Examples. The above method of Runge as well as methods of Runge and Heun of 
order 3 are given in Table 1.1. 

Table 1.1. Low order Runge-Kutta methods 

0 I 




1/2 

1/2 


0 


0 


1 

0 

1 

1/3 

1/3 

1/2 

1/2 

1 

0 

0 1 

2/3 

0 2/3 


0 1 

1/6 

2/3 0 1/6 


1/4 0 3/4 


Runge, order 2 Runge, order 3 Heun, order 3 


Discussion of Methods of Order 4 

Von den neueren Verfahren halte ich das folgende von Herrn Kutta 
angegebene flir das beste. (C. Runge 1905) 

Our task is now to determine the coefficients of 4 -stage Runge-Kutta methods (1.8) 
in order that they be of order 4. We have seen above what we must do: compute 
the derivatives of y 1 = y 1 (h) for h = 0 and compare them with those of the true 
solution for orders 1, 2, 3, and 4. In theory, with the known rules of differential 
calculus, this is a completely trivial task and, by the use of (1.9), results in the 


following conditions: 

Sz K = b 1 + b 2 + b 3 + b A = 1 (1.11a) 

Ei b i c i = b 2 c 2 + b 3 c 3 + b 4 c 4 = V 2 (1.11b) 

'}Zi b i C i = b 2 c 2 + b 3 c 3 + b 4 c 4 = 1/3 (l.llc) 

EiJ b i a ij C j = b 3 a 32 c 2 + M«42 c 2 + «43 C 3) = V 6 (l.lld) 

Ei b i c i = b 2 C 2 + b 3 C 3 + b 4 C 4 = V 4 (1-He) 


E i,j b i c i a ij c j = b 3 c 3 a 32 c 2 + ^4 C 4(«42 C 2 + a 43 c s) = V 8 ( 141f ) 
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d.iig) 
(1.1 lh) 


b i a ij c2 j ~ b 3 a 32 C 2 + ^4( a 42 C 2 + a 43 C l) — 1/12 
i,j,k b i a ij a jk C k = ^4 a 43 a 32 c 2 = 1/24. 

These computations, which are not reproduced in Kutta’s paper (they are, however, 
in Heun 1900), are very tedious. And they grow enormously with higher orders. 
We shall see in Section II.2 that by using an appropriate notation, they can become 
very elegant. 

Kutta gave the general solution of (1.11) without comment. A clear derivation 
of the solutions is given in Runge & Konig (1924), p. 291. We shall follow here 
the ideas of J.C. Butcher, which make clear the role of the so-called simplifying 
assumptions, and will also apply to higher order cases. 

Lemma 1.3. If 


E b i a ij= b j( 1 ~ c j), j = ( 1 . 12 ) 

i=j +1 

then the equations (d), (g), and (h) in (1.11) follow from the others. 

Proof. We demonstrate this for (g): 

E 'w; = E V:’ - E b P) = l~\ = Y2 

i,j j j 

by (c) and (e). Equations (d) and (h) are derived similarly. □ 


We shall now show that (1.12) is also necessary in our case: 

Lemma 1.4. For 5 = 4, the equations (1.11) and (1.9) imply (1.12). 

The proof of this lemma will be based on the following: 

Lemma 1.5. Let U and V be 3x3 matrices such that 

fa b 0\ / h \ 

UV= c d 0 , det “ ®)^0. (1.13) 

\0 0 0 / d ' 

Then either Ve 3 = 0 or U T e 3 = 0 where e 3 = (0, 0,1) T . 

Proof of Lemma 1.5. If det U 0, then UVe 3 = 0 implies V e 3 = 0. If det U — 0, 
there exists x={x 1 ,x 2 , x 3 ) T 0 such that U T x = 0, and therefore V T U T x = 0. 
But (1.13) implies that x must be a multiple of e 3 . □ 
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Proof of Lemma 1.4. Define 

dj = b i a ij ~ b jO-- c j) for J = 1, • - -, 4, 

i 


so that we have to prove d-= 0. We now introduce the matrices 


U= I b 2 c 2 b 3 c 3 b 4 c 4 


(In 


do 


d A 



f C 2 

r 2 

c 2 

Si «2i c i - c i/ 2 \ 

v = 

C 3 

C 3 

Sj a 3j C j — C l/ 2 


\c 4 

r 2 

c 4 

Si «4i c i - c i/ 2 / 


Jj 2 a 3 ^4 

Multiplication of these two matrices, using the conditions of (1.11), gives 
a/2 1/3 0 

UV= I 1/3 1/4 0 I with det 


0 0 0 , 


1/2 1/3 

1/3 1/4 


(1.14) 


^ 0 . 


Now the last column of V cannot be zero, since c x = 0 implies 

a 2j C j — C 2/ 2 = ~ C 2 / 2 7^ 0 

0 

by condition (h). Thus d 2 = d 3 = d A = 0 follows from Lemma 1.5. The last identity 
d x — 0 follows from d x + d 2 + d 3 + d 4 = 0, which is a consequence of (1.11 a,b) 
and (1.9). □ 


From Lemmas 1.3 and 1.4 we obtain 

Theorem 1.6. Under the assumption (1.9) the equations (1.11) are equivalent to 


+ b 2 + 63 + 64 — 1 

(1.15a) 

6 2 c 2 + fr 3 c 3 + &4C4 = 1/2 

(1.15b) 

6 2 c 2 + ^> 3 c 3 + ^ 4 C4 = 1/3 

(1.15c) 

b 2 c 2 + 6 3 Cg + 6^4 = 1/4 

(1.15e) 

^3 c 3 a 32 c 2 + ^4 c 4( a 42 c 2^ a 43 c 3) = 1/8 

(1.15f) 

^3 a 32 + ^4 a 42 = ^(1 — c 2) 

(1.15i) 

&4 a 43 = 6 3( 1 - C s) 

(1 -15j) 

0 = b 4 (l — c 4 ). 

(1.15k) 

□ 


It follows from (1.15j) and (1.1 lh) that 

b 3 b 4 c 2 (l - c 3 ) ^ 0. 

In particular this implies c 4 = 1 by (1.15k). 


(1.16) 
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Solution of equations (1.15). Equations (a)-(e) and (k) just state that b i and c- are 
the coefficients of a fourth order quadrature formula with c 1 = 0 and c 4 = 1. We 
distinguish four cases for this: 


1) c 2 =u, c 3 = v and 0, u, v , 1 are all distinct; (1.17) 

then (a)-(e) form a regular linear system for b 1 ,b 2 ,b 3 ,b 4 . This system has the so¬ 
lution 


1 — 2 (u T- v) + 6uv 
12 uv 
1-2 u 

3 12v(l — v)(v — u) 


2v — 1 

2 12u(l — u){y — u) ’ 

3 — 4 (u-\-v) -\-6uv 
>4= 12(l-u)(l-v) 


Due to (1.16) we have to assume that u, v are such that b 3 ^ 0 and b 4 ^ 0. The 
three other cases with double nodes are built upon the Simpson rule: 

2 ) c 3 = 0, c 2 = 1 / 2 , b 3 = w t^O, b 1 = l/6-w, b 2 = 4/6, 6 4 = 1/6; 

3) c 2 = c 3 = 1 / 2 , 6 4 = 1/6, b 3 = w ^ 0, b 2 = 4/6 — w, 6 4 = 1/6; 

4) c 2 = 1, c 3 = 1/2, b 4 =w^0, b 2 = 1/6 —w, b x = 1/6, 6 3 = 4/6. 

Once 6 - and c- are chosen, we obtain a 43 from (j), and then (f) and (i) form a 
linear system of two equations for a 32 and a 42 . The determinant of this system is 


det 


^3 

b 3 ° 3 c 2 


h 

b 4 c 4 c 2 


— b 3 b 4 c 2 (c 4 c 3 ) 


which is 7 ^ 0 by (1.16). Finally we obtain a 21 , a 31 , and a 41 from (1.9). 

Two particular choices of Kutta (1901) have become especially popular: case 
(3) with w = 2/6 and case (1) with um 1/3, v = 2/3. They are given in Table 
1.2. Both methods generalize classical quadrature rules in keeping the same order. 
The first is more popular, the second is more precise (“Wir werden diese Naherung 
als im allgemeinen beste betrachten ...”, Kutta). 


Table 1.2. Kutta’s methods 


0 


1/2 

1/2 

1/2 

0 1/2 

1 

0 0 1 


1/6 2/6 2/6 1/6 


“The” Runge-Kutta method 


0 


1/3 

1/3 

2/3 

-1/3 1 

1 

1 -1 1 


1/8 3/8 3/8 1/8 


3/8-Rule 
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“Optimal” Formulas 

Much research has been undertaken, in order to choose the “best” possibilities from 
the variety of possible 4 th order RK-formulas. 

The first attempt in this direction was the very popular method of Gill (1951), 
with the aim of reducing the need for computer storage (“registers”) as much as 
possible. The first computers in the fifties largely used this method which is there¬ 
fore of historical interest. Gill observed that most computer storage is needed for 
the computation of k 3 , where “registers are required to store in some form” 

y 0 a 31 hk 1 — l - ci 3 < 2 hk 2 ^ t/Q ct^-^hk-^ d^^hk^^ Uo b-^hk-^ b 2 hk 2 ^ hk 3 . 

“Clearly, three registers will suffice for the third stage if the quantities to be stored 
are linearly dependent, i.e., if” 

/1 a 31 a 32 \ 

det 1 a 41 a 42 = 0. 

\i h bj 

Gill observed that this condition is satisfied for the methods of type (3) if w = 
(1 + v / 0 J 5 )/ 3 . The resulting method can then be reformulated as follows (“As each 
quantity is calculated it is stored in the register formerly holding the corresponding 
quantity of the previous stage, which is no longer required”): 

y := initial value, k:=hf(y ), y:=y-\-0.bk, q := k , 

k := hf(y ), y := y + (1 - a/oT5 ){k - q) t 

q (2 — V%)k + (—2 + 3\/0.5)t/, 

k:=hf(y ), y := y + (1 + v / 05)(/c - q), ^‘ 18 ^ 

q := (2 + y/2)k + ( — 2 — 3\/0.5)^, 

k := hf(y ), 2/ : = 2/ + ^ - 1 , H compute next step) . 

Today, in large high-speed computers, this method is no longer used, but could still 
be of interest for very high dimensional equations. 

Other attempts have been made to choose u and v in (1.17), case (1), such that 
the error terms (terms in /i 5 , see Section II.3) become as small as possible. We 
shall discuss this question in Section II.3. 
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Numerical Example 

Zu grosses Gewicht darf man natiirlich solchen Beispielen nicht 
beilegen ... (W. Kutta 1901) 

We compare five different choices of 4 th order methods on the Van der Pol equation 
(1.16.2) with e = 1. As initial values we take y x { 0) = A, y 2 (0) = 0 on the limit 
cycle and we integrate over one period T (the values of A and T are given in 
Exercise 1.16.1). For a comparison of these methods with lower order ones we have 
also included the explicit Euler method, Runge’s method of order 2 and Heun’s 
method of order 3 (see Table 1.1). 

We have applied the methods with several fixed step sizes. The errors of both 
components and the number of function evaluations (fe ) are displayed in logarith¬ 
mic scales in Fig. 1.1. Whenever the error behaves like C • h p = C 1 • (fe)~ p , the 
curves appear as straight lines with slope 1/p. We have chosen the scales such that 
the theoretical slope of the 4 th order methods appears to be 45 °. 

These tests clearly show up the importance of higher order methods. Among 
the various 4 th order methods there is usually no big difference. It is interesting to 
note that in our example the method with the smallest error in y x has the biggest 
error in y 2 and vice versa. 



-classical RK (left tableau of Table 1.2) 

- Kutta’s 3/8 rule (right tableau of Table 1.2) 

- optimal formula, Ex. 3a, II.3, u = 0.3587, v = 0.6346 

-Ralston (1962), Hull (1967), u = 0.4, v = 0.45 

- Gill’s Formula (1.18) 

Fig. 1.1. Global errors versus number of function evaluations 
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Exercises 


1. Show that every s -stage explicit RK method of order s , when applied to the 
problem y' = Xy (A a complex constant), gives 

{iz^\)yo’ z=hX • 

3 = 0 

Hint. Show first that y 1 /y 0 must be a polynomial in z of degree s and then 
determine its coefficients by comparing the derivatives of y x , with respect to 
h , to those of the true solution. 


2. (Runge 1895, p. 175; see also the introduction to Adams methods in Chap. 
III. 1). The theoretical form of drops of fluids is determined by the differential 
equation of Laplace (1805) 


—2 = oi 


■Xk 1 + k 2 ) 


( 1 . 21 ) 


where a is a constant, {K 1 -\-K 2 )/2 the mean curvature, and z the height (see 
Fig. 1.2). If we insert 1/K 1 mr/sirup and K 2 — dp / ds , the curvature of the 
meridian curve, we obtain 


^ 9 /sin p dp\ 

where we put a = 1. Add 

dr dz 

— = cos <£, — = — sm <£, 

ds ds 


( 1 . 22 ) 


( 1 . 22 ’) 


to obtain a system of three differential equations for ip(s) , r(s) , z(s ), s being 
the arc length. Compute and plot different solution curves by the method of 
Runge (1.4) with initial values y?(0) = 0, r(0) = 0 and z(0) = z 0 (z 0 < 0 
for lying drops; compute also hanging drops with appropriate sign changes in 
(1.22)). Use different step sizes and compare the results. 

Hint. Be careful at the singularity in the beginning: from (1.22) and (1.22’) we 
have for small s that r = s, ip = (s with ( = — z 0 , hence (sin <p)/r —> — z 0 . 
A more precise analysis gives for small s the expansions (£ = — ^ z 0 ) 


ip: 


XsAs* + (±-^-) 
s 4 V48 120/ 

6 V 20 120/ 


s° + ... 




288 45 


5 

720/ 


s° + ... 
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Fig. 1.2. Drops 

3. Find the conditions for a 2-stage explicit RK-method to be of order two and 
determine all such methods (“... wozu eine weitere Erorterung nicht mehr 
notig ist”, Kutta). 

4. Find all methods of order three with three stages (i.e., solve (l.ll;a-d) with 
b A = 0). 

Result. c 2 =u, c 3 =v, a 32 =v(v — u)/{u{ 2 — 3u)), b 2 = (2 — 3u)/(6 u(u — 
v)), b 3 = (2-3 u)/(6v(v-u)), b 1 = l—b 2 —b 3 , a 31 = c 3 -a 32 , a 21 = c 2 

(Kutta 1901, p. 438). 

5. Construct all methods of order 2 of the form 


0 


C 2 

C 2 

C 3 

0 c 3 


0 0 1 


Such methods “have the property that the corresponding Runge-Kutta pro¬ 
cess requires relatively less storage in a computer” (Van der Houwen (1977), 
§2.7.2). Apply them to y' — A y and compare with Exercise 1. 

6. Determine the conditions for order two of the RK methods with two stages 
which do not satisfy the conditions (1.9): 

k i = f( x o + c ih, Do) 

k 2 = f{ x o + c 2 h,y 0 + a 21 hk 1 ) 

Vi = y 0 + h ( b i k i + b 2 k 2)- 

Discuss the use of this extra freedom for c x and c 2 (Oliver 1975). 




II.2 Order Conditions for Runge-Kutta Methods 


... I heard a lecture by Merson ... 

(J. Butcher’s first contact with RK methods) 


In this section we shall derive the general structure of the order conditions (Merson 
1957, Butcher 1963). The proof has evolved very much in the meantime, mainly 
under the influence of Butcher’s later work, many personal discussions with him, 
the proof of “Theorem 6” in Hairer & Wanner (1974), and our teaching experience. 
We shall see in Section II. 11 that exactly the same ideas of proof lead to a gen¬ 
eral theorem of composition of methods (= B -series), which gives access to order 
conditions for a much larger class of methods. 

A big advantage is obtained by transforming (1.1) to autonomous form by ap¬ 
pending x to the dependent variables as 



The main difficulty in the derivation of the order conditions is to understand the 
correspondence of the formulas to certain rooted labelled trees; this comes out 
most naturally if we use well-chosen indices and tensor notation (as in Gill (1951), 
Henrici (1962), p. 118, Gear (1971), p. 32). As is usual in tensor notation, we 
denote (in this section) the components of vectors by superscript indices which, in 
order to avoid confusion, we choose as capitals. Then (2.1) can be written as 

(y J y = f J (y\...,y n ), J = l,...,n. (2.2) 


We next rewrite the method (1.8) for the autonomous differential equation (2.2). 
In order to get a better symmetry in all formulas of (1.8), we replace k rj by the 
argument g i such that k { = /(ft). Then (1.8) becomes 

i-i 

9i = 2/o + 53 a ij h f J (9 1 j , ■■■, 9j), i = 1,... ,s 

y{ = 2/0+53 b j h f J (9j, ■■■, 9j)■ 

3 = 1 

If the system (2.2) originates from (2.1), then, for J = 1, 

i- 1 

9 i =2/o+53 % h = x o + c i h 

3 = 1 
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by (1.9). We see that (1.9) becomes a natural condition. If it is satisfied, then for 
the derivation of order conditions only the autonomous equation ( 2 . 2 ) has to be 
considered. 

As indicated in Section II. 1 we have to compare the Taylor series of y{ with 
that of the exact solution. Therefore we compute the derivatives of y{ and gf with 
respect to h at h = 0. Due to the similarity of the two formulas, it is sufficient to 
do this for gf . On the right hand side of (2.3) there appear expressions of the form 
hif(h ), so we make use of Leibniz’ formula 

(^(ft)) (,) L=„ = «'(#)) (, "\= 0 ' (2-4) 

The reader is now asked to take a deep breath, take five sheets of reversed computer 
paper, remember the basic rules of differential calculus, and begin the following 
computations: 

q = 0: from (2.3) 

(gt) i0) \h=o = y J o- ( 2 . 5 ; 0 ) 

q — 1: from (2.3) and (2.4) 

(ff/) (1) lfc=o =J2 a ijf J \y=yo- ( 2 - 5;1 ) 

3 

q — 2: because of (2.4) we shall need the first derivative of f J (gj ) 

(/ J (ffi)) (1) = • (sf) (1 \ ( 2 . 6 ; 1 ) 

K 

where, as usual, denotes df J /dy K . Inserting formula (2.5; 1) (with J 
replaced by j,k,K) into (2.6; 1) we obtain with (2.4) 

(ff/) (2) L=0 = 2 J2 a iJ a JkJ2fKf K \y=yo- ( 2 - 5 ; 2 ) 

j,k K 


q — 3: we differentiate (2.6; 1) to obtain 

(/ J (5i)) (2) =E/L(5i)-(5f) (1 H^) (1) +E^(5i)(5f) (2) . (2.6;2) 

K,L K 

The derivatives (gf)^ and (gf )( 2 ) at h = 0 are already available in (2.5;1) and 
(2.5;2). So we have from (2.3) and (2.4) 

(fl/) (3) |fc=0 = 3 E a ij a jk a jl E fKLf K f L \y=y 0 

j,k,l K,L 

+ 3 • 2 E % a jk a kl E IkIl f L \y=y 0 - 

j,k,l K,L 

The same formula holds for (y( )( 3 ) |^ =0 with a- replaced by bj . 


(2.5;3) 
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The Derivatives of the True Solution 

The derivatives of the correct solution are obtained much more easily just by dif¬ 
ferentiating equation (2.2): first 

(y J ) W =f J (y)- ( 2 - 7 ; 1 ) 

Differentiating (2.2) and inserting (2.2) again for the derivatives we get 

(« J ) m = ( 2 - 73 ) 

K K 

Differentiating (2.7;2) again we obtain 

(y J ) (3) = E f J K L (y)f K (y)f L (y ) + E fl(y)f?(v)f L (y)- (W) 

K,L K,L 


Conditions for Order 3 

For order 3, the derivatives (2.5; 1-3), (with a- replaced by b-) must be equal to 
the derivatives (2.7; 1-3), and this for every differential equation. Thus, comparing 
the corresponding expressions, we obtain: 

Theorem 2.1. The RK method (2.3) (and thus (1.8)) is of order 3 iff 


2>i= 1 . 

2 Ew = 1 - 


3 

j,k 

(2.8) 

3 E h o a jk a n = r 

6 E b j a jk a kl = !• 

j,k,l 

j,k,l 

□ 


Inserting J2k a jk ~ c j f rom (1-9), we can simplify these expressions still fur¬ 
ther and obtain formulas (a)-(d) of (1.11). 


Trees and Elementary Differentials 

But without a more convenient notation, it would be difficult to 
find the corresponding expressions ... This, however, can be at 
once effected by means of the analytical forms called trees ... 

(A. Cayley 1857) 

The continuation of this process, although theoretically clear, soon leads to very 
complicated formulas. It is therefore advantageous to use a graphical represen¬ 
tation: indeed, the indices j, /c, l and J, iC, L in the terms of (2.5;3) are linked 
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together as pairs of indices in a j/e , a jZ ,... in exactly the same way as upper and 
lower indices in the expressions f]KL’ f]K , namely 


/ 



for the first and second term respectively. We call these objects labelled trees, be¬ 
cause they are connected graphs (trees) whose vertices are labelled with summation 
indices. They can also be represented as mappings, e.g., 

Z i—^ k\-^j and Z i—> fc, k^j (2.9’) 

for the above trees. This mapping indicates to which lower letter the corresponding 
vertices are attached. 

Definition 2.2. Let A be an ordered chain of indices A = {j < k < l < m < ...} 
and denote by A q the subset consisting of the first q indices. A (rooted) labelled 
tree of order q (q > 1 ) is a mapping (the son-father mapping) 

t ■ \ \ {j} \ 

such that t(z) < z for all z £ A q \{j}. The set of all labelled trees of order q is 
denoted by LT q . We call “z” the son of “t(z) ” and “t(z) ” the father of “z”. The 
vertex “j”, the forefather of the whole dynasty, is called the root of t. The order 
q of a labelled tree is equal to the number of its vertices and is usually denoted by 
q = e(t ). 


Definition 2.3. For a labelled tree t £ LT q we call 

the corresponding elementary differential. The summation is over q — 1 indices 
K, L, ... (which correspond to A q \ {j} ) and the summand is a product of q / ’s, 
where the upper index runs through all vertices of t and the lower indices are the 
corresponding sons. We denote by F(t)(y) the vector (F 1 (f)(t/), ..., F n (t)(y )). 

If the set A q is written as 

A q = ik < h < ■ ■ ■ < 3 q }, ( 2 . 10 ) 

then we can write the definition of F{t) as follows: 

FJi m= e n 

Jq 1= 1 

since the sons of an index are its inverse images under the map t . 


( 2 . 11 ) 
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Examples of elementary differentials are 

T,fLf K f L and E 

K,L K,L 

for the labelled trees t 31 and t 32 above. These expressions appear in formulas 
(2.5;3) and (2.7;3). 

The three labelled trees 


l m m 



all look topologically alike, moreover the corresponding elementary differentials 

E rJ rM rK rL rj rL _pK rM rj rKrLrM 

JkmJ Jl J * 2^ Jklj JmJ » JlkJ JmJ ) 

K,L,M K,L,M K,L,M 

are the same, because they just differ by an exchange of the summation indices. 
Thus we give 

Definition 2.4. Two labelled trees t and u are equivalent, if they have the same 
order, say q, and if there exists a permutation a : A —> A , such that <j(j) = j and 
ta = au on A q \{j}. 

This clearly defines an equivalence relation. 

Definition 2.5. An equivalence class of qth order labelled trees is called & (rooted) 
tree of order q . The set of all trees of order q is denoted by T q . The order of a tree 
is defined as the order of a representative and is again denoted by g(t ). Furthermore 
we denote by a(t) (for t eT ) the number of elements in the equivalence class t ; 
i.e., the number of possible different monotonic labellings of t. 

Geometrically, a tree is distinguished from a labelled tree by omitting the la¬ 
bels. Often it is advantageous to include 0, the empty tree, as the only tree of 
order 0. The only tree of order 1 is denoted by r. The number of trees of orders 
1,2,..., 10 are given in Table 2.1. Representatives of all trees of order < 5 are 
shown in Table 2.2. 


Table 2.1. Number of trees up to order 10 


q 

1 2 3 4 5 6 7 

8 9 

10 

card(T,j) 

1 1 2 4 9 20 48 

115 286 

719 
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Table 2.2. Trees and elementary differentials up to order 5 


Q 

t 

graph 

7(0 

a(t) 

F J (t)(y) 


0 

0 

0 

1 

1 

y J 


1 

r 

•7 

1 

1 

f j 

1 

2 

t21 


2 

1 

E K fKf K 

S/c a jk 

3 

^31 

v 1 

7 \ z , 

3 

1 

\ ' pJ pK pL 

2^k,l IklI I 

^2k,l a jk a jl 


^32 

> 

6 

1 

fJ fK nL 

2^k,l IkIl J 

^2k,l a jk a kl 

4 

t41 

m 7 7 

\T/ 

7 ./ 

4 

1 

ST' pJ pK nL pM 

2-^k,l,m IklmI 1 1 

J2k,l,m a jk a jl a jm 


1 42 

V* 

8 

3 

pJ pK nL nM 

2^k,l,m IkmIl 1 1 

^2k,l,m a jk a kl a jm 


£43 

Y / m 

7 K 

r 

12 

1 

\ ^ pJ nK nL nM 

2-^k,l,m IkIlmI j 

^ \2k,l,m a jk a kl a km 


£44 

24 

1 

sr^ pJ pK pL pM 

2^k,l,m Jkj l j mj 

^ ~2k,l,m a jk a kl a lm 

5 

thl 

n ml 1 

p ^v k 

j .1 

r <£> 1 
m , X 

5 

1 

ST' nj nK nL nM nP 

l^JKLMPJ J J J 

a j k a j l a j m a j p 


£52 

10 

6 

pJ pK pL pM pP 

I KMPl L IIJ 

a jk a kl a jm a jp 


^53 

7 // m 

v* 

15 

4 

fJ fK nL nM nP 

I^IkpImlI I I 

a jk a kl a km a jp 


£54 

30 

4 

\ ^ pJ pK pL pM pP 

IkpIl JmJ I 

ctj jk ctj kl a lm ctj jp 


£55 


20 

3 

sr^ pJ pK pL pM pP 

I^IkmIl I Ip I 

a jk a kl a jm a mp 


^56 


20 

1 

pJ pK pL pM pP 

2^ IkIlmpI I I 

y~^ ^jk^kl^km^kp 


^57 

? V m 

40 

3 

pJ pK pL pM pP 

2^ IkIlpImI I 

^ a jk a kl a lm a kp 


^58 

60 

1 

pJ pK pL pM pP 

Z^ IkIl ImpI I 

yy ^jk^kl^lm^lp 


^59 

120 

1 

fj fK nL nM nP 

Z^ IkIl ImIp I 

yy ^jk^kl^lm^mp 


The Taylor Expansion of the True Solution 

We can now state the general result for the q th derivative of the true solution: 
Theorem 2.6. The exact solution of (2.2) satisfies 

(y) (q \x 0 )= F ( t )(yo) = ^2 a ( t ) F ( t )(yo)- 

teLT q teT q 


(2.7;q) 
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Proof. The theorem is true for q = 1, 2, 3 (see (2.7; 1-3) above). For the computation 
of, say, the 4th derivative, we have to differentiate (2.7;3). This consists of two 
terms (corresponding to the two trees of (2.9)), each of which contains three factors 
/••• (corresponding to the three nodes of these trees). The differentiation of these 
by Leibniz’ rule and insertion of (2.2) for the derivatives is geometrically just the 
addition of a new branch with a new summation letter to each vertex (Fig. 2.1). 

(2.7;1) 
(2.7;2) 


(2.7;3) 


(2.7;4) 

Fig. 2.1. Derivatives of exact solution 


•J 

I 

/* 


V* 

j 




\ 


• m . m ./ T J 

\Y i \} k i (/ k m \) k Y 


It is clear that by this process all labelled trees of order q appear for the q th 
derivative, each of them exactly once. 

If we group together the terms with identical elementary differentials, we ob¬ 
tain the second expression of (2.7;q). □ 


Faa di Bruno’s Formula 


Our next goal will be the computation of the q\h derivative of the numerical solu¬ 
tion y 1 and of the g-. For this, we have first to generalize the formulas (2.6; 1) (the 
chain rule) and (2.6;2) for the gth derivative of the composition of two functions. 
We represent these two formulas graphically in Fig. 2.2. 

Formula (2.6;2) consists of two terms; the first term contains three factors, 
the second contains only two. Here the node “Z” is a “dummy” node, not really 
present in the formula, and just indicates that we have to take the second derivative. 
The derivation of (2.6;2) will thus lead to five terms which we write down for the 
convenience of the reader (but not for the convenience of the printer ...) 
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/* 


/ \ 
j j 



Fig. 2.2. Derivatives of /' J (g) 


( 2 . 6 ; 1 ) 

( 2 . 6 - 2 ) 

(2.6;3) 


(f J (g)) {3) = E fKL M (9)-(g K ) {1) (9 L ) {1) (g M ) {1) 

K,L,M 

+ £ fiM ■ (^) < 2 > ( 9 t ) <1> + ■ (s K ) (1 Hs L r (2 . W) 

K,L K,L 

+ E /km(s) • (^) (2) (5 m ) (1) +E /^o) • (s*) (3) - 

K,M K 

The corresponding trees are represented in the third line of Fig. 2.2. Each time we 
differentiate, we have to 

i) differentiate the first factor ; i.e., we add a new branch to the root j ; 

ii) increase the derivative numbers of each of the g’s by 1 ; we represent this by 
lengthening the corresponding branch. 

Each time we add a new label. All trees which are obtained in this way are those 
“special” trees which have no ramifications except at the root. 

Definition 2.7. We denote by LS q the set of special labelled trees of order q which 
have no ramifications except at the root. 

Lemma 2.8 (Faa di Bruno’s formula). For q > 1 we have 

(/ J (s0) (9-1) = E E fL...,Kjg)<9 Kl ) {Sl) ---(g Km ) (Sm) ( 2-6;q-D 

u£LS q 

Here, for u G LS q , m is the number of branches leaving the root and 5 1S ..., 5 rn 
are the numbers of nodes in each of these branches, such that q = 1 + +... + 5 m . 

□ 

Remark. The usual multinomial coefficients are absent here, as we use labelled 
trees. 
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The Derivatives of the Numerical Solution 

It is difficult to keep a cool head when discussing the various 
derivatives ... (S. Gill 1956) 

In order to generalize (2.5; 1-3), we need the following definitions: 

Definition 2.9. Let t be a labelled tree with root j ; we denote by 

$ 7) = Z a ik a ...--- 

k,l ,... 

the sum over the q — 1 remaining indices (as in Definition 2.3). The sum¬ 

mand is a product of q — 1 a ’s, where all fathers stand two by two with their sons 
as indices. If the set A q is written as in (2.10), we have 

$ ii(9= Z a t(h),h--- a t( jq )j Q - (2- 13 ) 

32,..., jq 


Definition 2.10. For t E LT q let 7 (f) be the product of £>(f) and all orders of 
the trees which appear, if the roots, one after another, are removed from t . (See 
Fig. 2.3 or formula (2.17)). 


& 

y(t) = 9 -2-6 


.V, 

♦ V 

•4 =432 


Fig. 2.3. Example for the definition of 7 (t) 


The above expressions are of course independent of the labellings, so 4 L (f) as 
well as 7 (t) also make sense in T q . Examples are given in Table 2.2. 

Theorem 2.11. The derivatives of g i satisfy 

9i q) \h= 0 = Z 'Y( t )'52 a ij®j( t ) F ( t )(yo). (2-5;q) 

teLT q j 

The numerical solution y 1 of (2.3) satisfies 

y{ q) \ h =o= Z 

tC:LT q j 

*€T, j 


(2.14) 
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Proof. Because of the similarity of y 1 and g i (see (2.3)) we only have to prove 
the first equation. We do this by induction on q , in exactly the same way as we 
obtained (2.5; 1-3): we first apply Leibniz’ formula (2.4) to obtain 

(^) (?, L=o=«E (l .;(/ J ^)) (? " 1) |„ 0 - (2.15) 

3 

Next we use Faa di Bruno’s formula (Lemma 2.8). Finally we insert for the deriva¬ 
tives (gf s Y 5s \ which appear in (2.6;q-l) with S s <q, the induction hypothesis 
(2.5; 1) - (2.5;q-l) and rearrange the sums. This gives 

(9i) (q) \ h=0 mq ■■■ 7(*i)---7(0- 

U^zLSq t\ ELT<5i tm^zL/TdrYi 

E a «E a ifci $ *i(*i) • (2.16) 

3 fci km 

J2 fKi,...,K m (.Vo) FKl (h)(.Vo) ■ ■■ FKm ( t m)(y 0 )- 

K 1 ,...,Km 

The main difficulty is now to understand that to each tuple 

(u, , t m ) with u G LS q , t s G LT Ss 

there corresponds a labelled tree t G LT q such that 

= (2.17) 

F J (t)(y)= fL...,Kjy)F Kl (h)(y)---F K ~(tJ(y) (2.18) 

K 1 ,...,Km 

(*)= a jk 1 --- a jk m ^k 1 ( t l)---^k m ( t rn)- ( 2 - 19 > 

This labelled tree t is obtained if the branches of u are replaced by the trees 
t-L,..., t rn and the corresponding labels are taken over in a natural way, i.e., in 
the same order (see Fig. 2.4 for some examples). 

In this way, all trees t G LT q appear exactly once. Thus (2.16) becomes (2.5;q) 
after inserting (2.17), (2.18) and (2.19). □ 


The above construction of t can also be used for a recursive definition of trees. 
We first observe that the equivalence class of t (in Fig. 2.4) depends only on the 
equivalence classes of t x ,..., t m . 

Definition 2.12. We denote by 

* = [*i, , * ro ] (2.20) 

the tree, which leaves over the trees f l5 ... ,t m when its root and the adjacent 
branches are chopped off (Fig. 2.5). 
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U ty t 


Fig. 2.4. Example for the bijection (u, ti,..., t m ) t 

/ v> 

?2 ^ [?j, 

Fig. 2.5. Recursive definition of trees 

With (2.20) all trees can be expressed in terms of r ; e.g., t 21 = [r ], t 31 = [r, r \, 

*32 = [M]. •••-etc. 


The Order Conditions 

Comparing Theorems 2.6 and 2.11 we now obtain: 

Theorem 2.13. A Runge-Kutta method (1.8) is of order p iff 

= < 2 - 2 » 

for all trees of order < p. 

Proof While the “if” part is clear from the preceding discussion, the “only if” 
part needs the fact that the elementary differentials for different trees are actually 
independent. See Exercises 3 and 4 below. □ 
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From Table 2.1 we then obtain the following number of order conditions (see 
Table 2.3). One can thus understand that the construction of higher order Runge 
Kutta formulas is not an easy task. 


Table 2.3. Number of order conditions 


order p 

1 2 

3 

4 

5 

6 

7 

8 

9 

10 

no. of conditions 

1 2 

4 

8 

17 

37 

85 

200 

486 

1205 


Example. For the tree t 42 of Table 2.2 we have (using (1.9) for the second expres¬ 
sion) 

Z = Z 6 i a ifc C i C fc = 

j,k,l,m j,k 

which is (1.11 ;f). All remaining conditions of (1.11) correspond to the other trees 
of order < 4. 


Exercises 


1. Find all trees of order 6 and order 7. 

Hint. Search for all representations of p — 1 as a sum of positive integers, and 
then insert all known trees of lower order for each term in the sum. You may 
also use a computer for general p. 

2. (A. Cayley 1857). Denote the number of trees of order q by a q . Prove that 

a x + a 2 x + a 3 x 2 -\-a 4 x 3 + ... = (1 — x)~ ai (l — x 2 )~ a2 (l — x 3 )~ a3 - 

Compare the result with Table 2.1. 

3. Compute the elementary differentials of Table 2.2 for the case of the scalar 
non-autonomous equation (2.1), i.e., f 1 = 1, / 2 = f(x,y ). One imagines the 
complications met by the first authors (Kutta, Nystrom, Huta) in looking for 
higher order conditions. Observe also that in this case the expressions for t 54 
and t 57 are the same, so that here Theorem 2.13 is sufficient, but not necessary 
for order 5. 

Hint. For, say, t 54 we have non-zero derivatives only if K = L = 2. Letting 
M and P run from 1 to 2 we then obtain 

F 2 (t) = (f x + ffy)(fyx + ffyy)fy 

(see also Butcher 1963a). 
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4. Show that for every t eT q there is a system of differential equations such that 
F 1 (t)(y 0 ) = 1 and F 1 (u)(y 0 ) = 0 for all other trees u. 

Hint. For t 54 this system would be 

yl=2/2%> 2/2 = %> 2/3 = 2/4> 2/4 = 1 > 2/5 = 1 

with all initial values = 0. Understand this and the general formula 

^father = 11 %ons- 

5. Kutta (1901) claimed that the scheme given in Table 2.4 is of order 5. Was he 
correct in his statement? Try to correct these values. 

Result. The values for a 6j (j = 1,..., 5) should read ( 6 , 36,10, 8 ,0)/75; the 
correct values for b q are (23,0,125, 0, —81,125)/192 (Nystrom 1925). 

Table 2.4. A method of Kutta 
I 

0 


1 

1 




3 

3 




2 

4 

6 



5 

25 

25 



1 

1 

4 

-3 

15 

4 


2 

6 

90 

50 

8 

3 

81 

81 

81 

81 

4 

7 

18 

5 

4 

5 

30 

30 

30 

30 


0 


48 

192 


125 

192 


0 


81 

192 


100 

192 


6. Verify £ «(<) = (P-D! 

g(t)=p 

7. Prove that a Runge-Kutta method, when applied to a linear system 

y' = A(x)y + g(x), ( 2 . 22 ) 

is of order p iff 


Ej V-‘j =1 /<z for q<p 
^j,fc 6 7 * * i c r lo ifc^ _1=1 /((® + r ) r ) for 9 + r<p 


T,j,k,i b j c j la jk c l la ki c i 1 =l/((<? + r + s)(r + s)s) for q + r + s<p 

... etc (write ( 2 . 22 ) in autonomous form and investigate which elementary 
differentials vanish identically; see also Crouzeix 1975). 



II.3 Error Estimation and Convergence 
for RK Methods 


Es fehlt indessen noch der Beweis dass diese Naherungs-Ver- 
fahren convergent sind oder, was practisch wichtiger ist, es fehlt 
ein Kriterium, um zu ermitteln, wie klein die Schritte gemacht 
werden mtissen, um eine vorgeschriebene Genauigkeit zu erre- 
ichen. (Runge 1905) 


Since the work of Lagrange (1797) and, above all, of Cauchy, a numerically es¬ 
tablished result should be accompanied by a reliable error estimation (“... l’erreur 
commise sera inferieure a ...”). Lagrange gave the well-known error bounds for 
the Taylor polynomials and Cauchy derived bounds for the error of the Euler poly¬ 
gons (see Section 1.7). A couple of years after the first success of the Runge-Kutta 
methods, Runge (1905) also required error estimates for these methods. 


Rigorous Error Bounds 


Runge’s device for obtaining bounds for the error in one step (“local error”) can be 
described in a few lines (free translation): 

“For a method of order p consider the local error 

e(h) =y(x 0 + h)-y 1 (3.1) 

and use its Taylor expansion 

h v 

e(h) = e(0) +he'( 0) + ... + —e (p) (0h) (3.2) 

with 0 < 6 < 1 and e(0) = e' (0) = ... = e (p > (0) = 0. Now compute explicitly 
e(p) (/->,), which will be of the form 

e {p \h) = E 1 (h) + hE 2 (h), (3.3) 

where E^h) and E 2 (h) contain partial derivatives of / up to order p— 1 and 
p respectively. Further, because of eW(0) = 0, we have E x { 0) = 0. Thus, if all 
partial derivatives of / up to order p are bounded, we have E^h) = 0{h) and 
E 2 {h) = 0(1). So there is a constant C such that |eW (h) \ < Ch and 

h p+1 

Wh)\<C-^-. ” (3.4) 
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A slightly different approach is adopted by Bieberbach (1923,1. Abschn., Kap. 
II, §7), explained in more detail in Bieberbach (1951): we write 

e(/i) = y(x 0 + h) —y 1 = y(x 0 + h)-y 0 -hJ2 b A (3-5) 

and use the Taylor expansions 

/i2 

V{x o + h) = y 0 + y'(x 0 )h + y"(x 0 ) — + ... + y (p+ 1 ) (x 0 + 0 h) 

KA = K(0) + k'(0)h + ... + (3.6) 

where, for vector valued functions, the formula is valid componentwise with possi¬ 
bly different 0 ’s. The first terms in the h expansion of (3.5) vanish because of the 
order conditions. Thus we obtain 


Theorem 3.1. If the Runge-Kutta method (1.8) is of order p and if all partial 
derivatives of f(x, y ) up to order p exist (and are continuous), then the local error 
of ( 1 . 8 ) admits the rigorous bound 


|imax ||y (p+1) (a: 0 + th)\\ 
f max || k’p 

n ^ 1 zl +^\n 11 " z 


and hence also 


p! ^ ' *'■ tE [0,1] 

\\y(x 0 + h) -yj <Ch p+1 . 


(3.7) 


(3.8) 


□ 


Let us demonstrate this result on Runge’s hrst method (1.4), which is of order 
p = 2, applied to a scalar differential equation. Differentiating (1.1) we obtain 

V (3) 0 ) = f XX + 2 f xy f + fyyf 2 + fy(f x 1 ” fy^ (x , ^ (x)) ( 3 . 9 ) 

while the second derivative of k 2 (h) = f(x 0 + \ , y 0 + \ / 0 ) is given by 

fc 2 2) (h) = 4 (Sxx ( x o + 2 ’ y o + 2 ^ + 2 txy(—)fi 0 + fyy(—)fo) (3-10) 

(/ 0 stands for f(x 0 , y 0 ) ). Under the assumptions of Theorem 3.1 we see that the 
expressions (3.9) and (3.10) are bounded by a constant independent of h , which 
gives (3.8). 
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The Principal Error Term 

For higher order methods rigorous error bounds, like (3.7), become very unprac¬ 
tical. It is therefore much more realistic to consider the first non-zero term in the 
Taylor expansion of the error. For autonomous systems of equations (2.2), the error 
term is best obtained by subtracting the Taylor series and using (2.14) and (2.7;q). 

Theorem 3.2. If the Runge-Kutta method is of order p and if f is (p + 1) -times 
continuously differentiable, we have 

/ 1 P + 1 

y J (x 0 + h)-y( = ——- a {r)e(t)F J {t){y 0 ) + O(h p+ 2 ) (3.11) 

t£Tp -)_i 

where s 

= (3.12) 


7 (t) and &j(t) are given in Definitions 2.9 and 2.10; see also formulas (2.17) 
and (2.19). The expressions eft) are called the error coefficients. 

Example 3.3. For the two-parameter family of 4 th order RK methods (1.17) the 
error coefficients for the 9 trees of Table 2.2 are (c 2 = u, c 3 = v): 


(*5i) = -7 + 7 -An + v)-- 


e (^53 

efes) = 1 - 


12 6 7 
5 1 

8 ^ _ 4’ 

5(6 4 + 63(3 — 4 v ) 2 ) 

2 ’ 


1446 3 6 4 (1 — v) 
e (^56) = — 4e(f 51 ), 
e (^ 5 s) = — ^e(t 53 ), 


,, x 5 1 

(h2) = ^v-~. 


e(f. 


547 


12 

1 

4 5 


e (^5T) — ^e(f 52 ), 

^(^59) = — 4 e(t 54 ). 


(3.13) 


Proof The last four formulas follow from (1.12). e(t 59 ) is trivial, e(t 58 ) and 
e(t 57 ) follow from (1.11 h). Further 


eft 


hi) 


> [ tft-l)ft- 
J 0 


■ u) ft — v) dt 


expresses the quadrature error. For e(t 55 ) one best introduces d i = a ij c j suc h 
that e(t 55 ) = 1 — 20 bi c i c i • Then from (1.1 Id,f) one obtains 

1 , , 3-4v 


°’ ^ 24(1 — v) ’ 


6 4 c 4 24(1 — v) ’ 
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For the classical 4 th order method (Table 1.2a) these error coefficients are 
given by Kutta (1901), p. 448 (see also Lotkin 1951) as follows 

_i 1 _i J H _I o 

V 24’ 24’ 16’ 4’ 3’ 6’ 6’ 4’ / 

Kutta remarked that for the second method (Table 1.2b) (“Als besser noch erweist 
sich ... ”) the error coefficients become 

/II 1 112 i i \ 

V 54’ 36 ,_ 24 ,_ 4 ,_ 9’ 27 ,_ 9’ 6’ / 

which, with the exception of the 4 th and 9 th term, are all smaller than for the above 
method. A tedious calculation was undertaken by Ralston (1962) (and by many 
others) to determine optimal coefficients of (1.17). For solutions which minimize 
the constants (3.13), see Exercise 3 below. 


Estimation of the Global Error 

Das war auch eine aufregende Zeit ... (R Henrici 1983) 

The global error is the error of the computed solution after several steps. Suppose 
that we have a one-step method which, given an initial value (x 0 , y 0 ) and a step 
size h, computes a numerical solution y l approximating y(x 0 + h). We shall 
denote this process by Henrici’s notation 

y 1 =y 0 + h$(x 0 ,y 0 ,h) (3.14) 

and call the increment function of the method. 

The numerical solution for a point X > x 0 is then obtained by a step-by-step 
procedure 

Vi+1 =y i + h Mxi,y i ,h i ), h i = X i+ 1 -Xi, x N =X (3.15) 
and our task is to estimate the global error 

E = y(X)-y N . (3.16) 

This estimate is found in a simple way, very similar to Cauchy’s convergence proof 
for Theorem 7.3 of Chapter I: the local errors are transported to the final point x N 
and then added up. This “error transport” can be done in two different ways: 

a) either along the exact solution curves (see Fig. 3.1); this method can yield 
sharp results when sharp estimates of error propagation for the exact solutions 
are known, e.g., from Theorem 10.6 of Chapter I based on the logarithmic norm 
M df/dy ). 

b) or along N — i steps of the numerical method (see Fig. 3.2); this is the 

method used in the proofs of Cauchy (1824) and Runge (1905), it generalizes eas¬ 

ily to multistep methods (see Chapter III) and will be an important tool for the 
existence of asymptotic expansions (see II.8). 
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Fig. 3.1. Global error estimation, 
method (a) 



Fig. 3.2. Global error estimation, 
method (b) 


In both cases we first estimate the local errors e- with the help of Theorem 3.1 
to obtain 

||ej <C-h^l (3.17) 


Warning. The e- of Fig. 3.1 and Fig. 3.2, for i 1, are not the same, but they 
allow similar estimates. 


We then estimate the transported errors E i : for method (a) we use the known 
results from Chapter I, especially Theorem 1.10.6, Theorem 1.10.2, or formula 
(1.7.17). The result is 


Theorem 3.4. Let U be a neighbourhood of {(x, y(x)) \x 0 < x < X} where y(x ) 
is the exact solution of (1.1). Suppose that in U 


d£ 

dy 


< L 


or 



(3.18) 


and that the local error estimates (3.17) are valid in U. Then the global error (3.16) 
can be estimated by 

\\E\\<hpf-{exp(L(X-x 0 ))-l) (3.19) 


where h = max , 


f C L> 0 

\(7exp(— Lh) L< 0, 


and h is small enough for the numerical solution to remain in U. 


Remark. For L —> 0 the estimate (3.19) tends to h p C (x N — x 0 ). 










II.3 Error Estimation and Convergence for RK Methods 


161 


Proof. From Theorem 1.10.2 (with e = 0) or Theorem 1.10.6 (with 5 = 0) we 
obtain 

M <exp(LO JV -a: i ))||eJ. (3.20) 

We then insert this together with (3.17) into 

N 

psil <£11^11- 

Using h p +l < h p • h i _ 1 this leads to 

\\E\\ < h p c(h 0 exp {L(x n - x ± )) + h x exp (L(x n - x 2 )) + • • 

The expression in large brackets can be bounded by 

f‘XN 

/ exp (L(x N — x))dx for L>0 (3.21) 

Jx o 
PX N 

/ exp (L(x N — h — x))dx for L<0 (3.22) 

J x 0 

(see Fig. 3.3). This gives (3.19). □ 



Fig. 3.3. Estimation of Riemann sums 


For the second method (b) we need an estimate for \\z i+l — y i+1 1| in terms of 
\\z i — y ^\\, where, besides (3.15), 

z i+1 =z i + h i $(x i ,z i ,h i ) 

is a second pair of numerical solutions. For RK-methods z i+1 is defined by 

4 = f(x i ,z i ), 

4 = f(Xi + C 2 h i’ Z i + h i a 2 ltl), etc. 
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We now subtract formulas (1.8) from this and obtain 

114-M < L \\ z i-vM 

¥ 2 -k 2 \\< L ( 1 + \ a 2 i\h i L )\\Zi-y i \\, etc. 

This leads to the following 

Lemma 3.5. Let L be a Lipschitz constant for f and let h i <h. Then the incre¬ 
ment function <f> of method ( 1 . 8 ) satisfies 

< AH^-yJ (3.23) 

where 

A = L(j2\K\+hLj2 I b i0ij | + h 2 L 2 J2 \\%a jk \ (3.24) 

^ |-| 

From (3.23) we obtain 

\\ z i+i -Vi+i\\ < ( 1 +K A )\\ z i^yi\\ <exp( h i A )\\ z i-yi\\ O- 25 ) 

and for the errors in Fig. 3.2, 

\\Ei\\ <exp(A(a; JV -a; i ))||eJ (3.26) 

instead of (3.20). The same proof as for Theorem 3.4 now gives us 

Theorem 3.6. Suppose that the local error satisfies, for initial values on the exact 
solution, 

\\y(x + h) - y(x) - h$(x, y(x), h) || < Ch p+1 , (3.27) 

and suppose that in a neighbourhood of the solution the increment function <I> 
satisfies 

||$4, h) - $(x, y, h )|| < A\\z - y\\. (3.28) 

Then the global error (3.16) can be estimated by 

||£|| < h p j (exp(A(x N - x 0 )) - l) (3.29) 


where h = max h •. 


□ 
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Exercises 

1. (Runge 1905). Show that for explicit Runge Kutta methods with 6i>0.Oy>0 
(all i, j) of order s the Lipschitz constant A for satisfies 

1 + hA < exp (hL) 

and that (3.29) is valid with A replaced by L . 

2. Show that e(t 55 ) of (3.13) becomes 

(Av 2 - lhv + 9) - u(6v 2 - 42v + 27) - u 2 (26v - 18) 

55 12(1 — 2u)(6uv — 4{u + v) + 3) 

after inserting (1.17). 

3. Determine u and v in (1.17) such that in (3.13) 

a) max i=5j6j7 8 |e(t 5i )| = min b) Ei=i \ e (hi)\ = min 

c) max i=5 6 7 8 a(t 5i )\e(t 5i )\=mm d) Ei=i “(*5») l e (* 5 i)l = min 

Results. 

a) u = 0.3587, v = 0.6346, min = 0.1033; 

b) u = 0.3995, v = 0.6, min =1.55; 

c) it = 0.3501, v = 0.5839, min = 0.1248; 

d) u = 0.3716, i; = 0.6, min = 2.53. 

Such optimal formulas were first studied by Ralston (1962), Hull & Johnston 
(1964), and Hull (1967). 

4. Apply an explicit Runge-Kutta method to the problem y' = f(x, y ), y( 0) = 0, 
where 

I -y + g(x) if x > 0 

f(x,y)=< x 

{ (1 - A) _1 g'(0) if x = 0, 

A < 0 and g{x) is sufficiently differentiable (see Exercise 10 of Section 1.5). 

a) Show that the error after the first step is given by 

//(/,) Q/r//(()) I 0(h 3 ) 

where C 2 is a constant depending on A and on the coefficients of the 
method. Also for high order methods we have in general C 2 ^0. 

b) Compute C 2 for the classical 4 th order method (Table 1.2). 
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Ich glaube indessen, dass ein practischer Rechner sich meistens 
mit der geringeren Sicherheit begnligen wird, die er aus der Ue- 
bereinstimmung seiner Resultate fiir grossere und kleinere Schritte 
gewinnt. (C. Runge 1895) 


Even the simplified error estimates of Section II.3, which are content with the lead¬ 
ing error term, are of little practical interest, because they require the computation 
and majorization of several partial derivatives of high orders. But the main advan¬ 
tage of Runge-Kutta methods, compared with Taylor series, is precisely that the 
computation of derivatives should be no longer necessary. However, since prac¬ 
tical error estimates are necessary (on the one hand to ensure that the step sizes 
h i are chosen sufficiently small to yield the required precision of the computed 
results, and on the other hand to ensure that the step sizes are sufficiently large to 
avoid unnecessary computational work), we shall now discuss alternative methods 
for error estimates. 

The oldest device, used by Runge in his numerical examples, is to repeat the 
computations with halved step sizes and to compare the results: those digits which 
haven’t changed are assumed to be correct (“... woraus ich schliessen zu diirfen 
glaube ...”). 


Richardson Extrapolation 

... its usefulness for practical computations can hardly be over¬ 
estimated. (G. Birkhoff & G.C. Rota) 

The idea of Richardson, announced in his classical paper Richardson (1910) which 
treats mainly partial differential equations, and explained in full detail in Richard¬ 
son (1927), is to use more carefully the known behaviour of the error as a function 
of h. 

Suppose that, with a given initial value (x 0 , y Q ) and step size h, we compute 
two steps, using a fixed Runge-Kutta method of order p, and obtain the numerical 
results y x and y 2 . We then compute, starting from (x 0 , y 0 ), one big step with step 
size 2 h to obtain the solution w. The error of y x is known to be (Theorem 3.2) 

e x = -y(x o +h)- yi = c- h p+1 + 0(h p+2 ) (4.1) 

where C contains the error coefficients of the method and the elementary differ¬ 
entials F J (t)(y 0 ) of order p+ 1. The error of y 2 is composed of two parts: the 
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transported error of the first step, which is 

( / + /l dy + C, ( /l2 )) e i> 

and the local error of the second step, which is the same as (4.1), but with the 
elementary differentials evaluated at y x = y 0 + O(h). Thus we obtain 

e 2 = y(x 0 + 2 h) -y 2 = (l + 0(h))Ch p+1 + (C + 0(h))h p+1 + 0(h p+2 ) 

= 2 Ch p+1 +0(h p+2 ). (4.2) 

Similarly to (4.1), we have for the big step 

y(x 0 + 2 h)-w = C(2h) p+1 + 0(h p+2 ). (4.3) 

Neglecting the terms (D(h p+2 ) , formulas (4.2) and (4.3) allow us to eliminate the 
unknown constant C and to “extrapolate” a better value y 2 for y(x Q + 2 h) , for 
which we obtain: 

Theorem 4.1. Suppose that y 2 is the numerical result of two steps with step size h 
of a Runge-Kutta method of order p, and w is the result of one big step with step 
size 2 h. Then the error of y 2 can be extrapolated as 

y ( X0 + 2h)-y 2 = ^f+O(h p+2 ) (4.4) 

and 

V2 = V2 + fErf (4 - 5) 

is an approximation of order p + 1 to y(x 0 + 2 h ). □ 


Formula (4.4) is a very simple device to estimate the error of y 2 and formula 
(4.5) allows one to increase the precision by one additional order (“... The better 
theory of the following sections is complicated, and tends thereby to suggest that 
the practice may also be complicated; whereas it is really simple.” Richardson). 


Embedded Runge-Kutta Formulas 

Scraton is right in his criticism of Merson’s process, although 
Merson did not claim as much for his process as some people 
expect. (R. England 1969) 

The idea is, rather than using Richardson extrapolation, to construct Runge-Kutta 
formulas which themselves contain, besides the numerical approximation y 1 , a 
second approximation y x . The difference then yields an estimate of the local error 
for the less precise result and can be used for step size control (see below). Since 
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it is at our disposal at every step, this gives more flexibility to the code and makes 
step rejections less expensive. 

We consider two Runge-Kutta methods (one for y 1 and one for y x ) such that 
both use the same function values. We thus have to find a scheme of coefficients 



such that 

Vi = yo + H b i k i + • • - + KK) ( 4 . 7 ) 

is of order p , and 

y 1 =y 0 + h(b 1 k 1 +.. . + b s k s ) (4.7’) 

is of order p (usually p = p— 1 or p = p+1). The approximation y x is used to 
continue the integration. 

From Theorem 2.13, we have to satisfy the conditions 


3 = 1 

3 =1 


1 


l 

7(i) 


for all trees of order < p , 


for all trees of order < p . 


(4.8) 

(4.8’) 


The first methods of this type were proposed by Merson (1957), Ceschino (1962), 
and Zonneveld (1963). Those of Merson and Zonneveld are given in Tables 4.1 and 
4.2. Here, “name p(p) ” means that the order of y x is p and the order of the error 
estimator y x is p. Merson’s y x is of order 5 only for linear equations with constant 
coefficients; for nonlinear problems it is of order 3. This method works quite well 
and has been used very often, especially by NAG users. Further embedded methods 
were then derived by Sarafyan (1966), England (1969), and Fehlberg (1964, 1968, 
1969). Let us start with the construction of some low order embedded methods. 


Methods of order 3(2). It is a simple task to construct embedded formulas of order 
3(2) with <s = 3 stages. Just take a 3-stage method of order 3 (Exercise II. 1.4) and 
put b 3 = 0, b 2 = 1/2c 2 , b x — 1 — 1/2c 2 . 
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Methods of order 4(3). With si 4 it is impossible to find a pair of order 4(3) 
(see Exercise 2). The idea is to add y x as 5 th stage of the process (i.e., a 5 - = 5- 
for i = 1,..., 4) and to search for a third order method which uses all five func¬ 
tion values. Whenever the step is accepted this represents no extra work, because 
f(x 0 + ft, y x ) has to be computed anyway for the following step. This idea is called 
FSAL (First Same As Fast). Then the order conditions (4.8’) with p = 3 represent 
4 linear equations for the five unknowns b x ,..., b 5 . One can arbitrarily fix b 5 ^ 0 
and solve the system for the remaining parameters. With b 5 chosen such that b 4 = 0 
the result is 


b\ = 22>i - 1/6, b 2 = 2(1 — c 2 )b 2 , 

b 3 = 2(1 — c 3 )6 3 , t> 4 =0, b 3 = 1/6. 


(4.9) 


Automatic Step Size Control 

D’ordinaire, on se contente de multiplier ou de diviser par 2 la 
valeur du pas ... (Ceschino 1961) 

We now want to write a code which automatically adjusts the step size in order to 
achieve a prescribed tolerance of the local error. 

Whenever a starting step size h has been chosen, the program computes two 
approximations to the solution, y x and y x . Then an estimate of the error for the 
less precise result is y x — y x . We want this error to satisfy componentwise 

\y u -yu\ < sc i’ sc i =Atol i +m&-K{\y 0i \,\y li \)-Rtol i (4.10) 

where Atol { and Rtol { are the desired tolerances prescribed by the user (relative 
errors are considered for Atol • = 0, absolute errors for Rtol i = 0; usually both 
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tolerances are different from zero; they may depend on the component of the solu¬ 
tion). As a measure of the error we take 



other norms, such as the max norm, are also of frequent use. Then err is compared 
to 1 in order to find an optimal step size. From the error behaviour err « C • h q+1 
and from 1 « C • h q ^ (where q = min(p, p) ) the optimal step size is obtained as 
(“... le procede connu”, Ceschino 1961) 

h apt = h-(l/err) 1 'to +1 \ (4.12) 

Some care is now necessary for a good code: we multiply (4.12) by a safety factor 
fac, usually fac = 0.8, 0.9, (0.25)VC^ 1 ), or (0.38) 1 /(^+ 1 ), so that the error will 
be acceptable the next time with high probability. Further, h is not allowed to 
increase nor to decrease too fast. For example, we may put 

/i new — ^ ' min (facmax, ma x(facmin,fac • (1 / err) 1 ^ q+1 ^)) (4.13) 

for the new step size. Then, if err < 1, the computed step is accepted and the 
solution is advanced with y 1 and a new step is tried with /i new as step size. Else, 
the step is rejected and the computations are repeated with the new step size /i new . 
The maximal step size increase facmax , usually chosen between 1.5 and 5, pre¬ 
vents the code from too large step increases and contributes to its safety. It is clear 
that, when chosen too small, it may also unnecessarily increase the computational 
work. It is also advisable to put facmax = 1 in the steps right after a step-rejection 
(Shampine & Watts 1979). 

Whenever y 1 is of lower order than y x , then the difference y 1 — y x is (at least 
asymptotically) an estimate of the local error and the above algorithm keeps this 
estimate below the given tolerance. But isn’t it more natural to continue the integra¬ 
tion with the higher order approximation? Then the concept of “error estimation” 
is abandoned and the difference y x — y x is only used for the purpose of step size 
selection. This is justified by the fact that, due to unknown stability and instability 
properties of the differential system, the local errors have in general very little in 
common with the global errors. The procedure of continuing the integration with 
the higher order result is called “local extrapolation”. 

A modification of the above procedure (PI step size control), which is particu¬ 
larly interesting when applied to mildly stiff problems, is described in Section IV.2 
(Volume II). 
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Starting Step Size 


If anything has been made foolproof, a better fool will be devel¬ 
oped. (Heard from Dr. Pirkl, Baden) 

For many years, the starting step size had to be supplied to a code. Users were 
assumed to have a rough idea of a good step size from mathematical knowledge 
or previous experience. Anyhow, a bad starting choice for h was quickly repaired 
by the step size control. Nevertheless, when this happens too often and when the 
choices are too bad, much computing time can be wasted. Therefore, several people 
(e.g., Watts 1983, Hindmarsh 1980) developed ideas to let the computer do this 
choice. We take up an idea of Gladwell, Shampine & Brankin (1987) which is 
based on the hypothesis that 

local error « Ch p+1 y^ p+1 \x 0 ). 

Since y(v+ x ) (x 0 ) is unknown we shall replace it by approximations of the first and 
second derivative of the solution. The resulting algorithm is the following one: 

a) Do one function evaluation f(x 0 ,y 0 ) at the initial point. It is in any case 
needed for the first RK step. Then put d 0 = \\y 0 1| and d 1 = ||/(x 0 , y 0 ) ||, where 
the norm is that of (4.11) with sc i = Atol { + \y 0i \- Rtol i . 

b) As a first guess for the step size let 

h 0 = om-(d 0 /d 1 ) 

so that the increment of an explicit Euler step is small compared to the size of 
the initial value. If either d Q or d x is smaller than 10 -5 we put h 0 = 10 -6 . 

c) Perform one explicit Euler step, y 1 =y Q + h Q f(x 0 , y 0 ), and compute f(x 0 4- 
h’Vi)- 

d) Compute d 2 = \\f(x 0 + h 0 , y x ) — f(x 0 , y 0 )\\/h 0 as an estimate of the second 
derivative of the solution; the norm being the same as in (a). 

e) Compute a step size h 1 from the relation 

h\ +l • max(d 1? d 2 ) = 0.01. 

If max(t/ 1 , d 2 ) < 10 -15 we put h x = max(10 -6 , h Q • 10 -3 ). 

f) Finally we propose as starting step size 

h = min(100 • h 0 , /i 1 ). (4.14) 

An algorithm like the one above, or a similar one, usually gives a good guess for the 
initial step size (or at least avoids a very bad choice). Sometimes, more informa¬ 
tion about h is known, e.g., from previous experience or computations of similar 
problems. 
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Numerical Experiments 

As a representative of 4 -stage 4 th order methods we consider the “3/8 Rule” of 
Table 1.2. We equipped it with the embedded formula (4.9) of order 3. 



Step control mechanism. Fig. 4.1 presents the results of the step control mecha¬ 
nism (4.13) described above. As an example we choose the Brusselator (see Sec¬ 
tion 1.16). 


y[ = 1 + vly <2 - 4 2 /i 

IJ2 = ^Vi — 3/12/2 


(4.15) 


with initial values y 1 (0) = 1.5, y 2 { 0) = 3, integration interval 0 < x < 20 and 
Atol = Rtol = 10 -4 . The following results are plotted in this figure: 
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i) At the top, the solutions y x (x) and y 2 {x) with all accepted integration steps; 

ii) then all step sizes used; the accepted ones are connected by a polygon; the 
rejected ones are indicated by x ; 

iii) the third graph shows the local error estimate err , the exact local error and the 
global error; the desired tolerance is indicated by a broken horizontal line. 

It can be seen that, due to the instabilities of the solutions with respect to the initial 
values, quite large global errors occur during the integration with small local toler¬ 
ances everywhere. Further many step rejections can be observed in regions where 
the step size has to be decreased. This cannot easily be prevented, because right 
after an accepted step, the step size proposed by formula (4.13) is (apart from the 
safety factor) always increasing. 

Numerical comparison. We are now curious to see the behaviour of the variable 
step size code, when compared to a fixed step size implementation. We applied 
both implementations to the Brusselator problem (4.15) with the initial values used 
there. The tolerances (Atol = Rtol) are chosen between 10 -2 and 10 -10 with ratio 
\/T0. The results are then plotted in Fig. 4.2. There, the abscissa is the global error 
at the endpoint of integration (the “precision”), and the ordinate is the number of 
function evaluations (the “work”). We observe that for this problem the variable 
step size code is about twice as fast as the fixed step size code. There are, of 
course, problems (such as equation (0.1)) where variable step sizes are much more 
important than here. 



Fig. 4.2. Precision-Work diagram 

In this comparison we have included some higher order methods, which will 
be dicussed in Section II.5. The code RKF45 (written by H.A. Watts and L.F. 
Shampine) is based on an embedded method of order 5(4) due to Fehlberg. The 
codes DOPRI5 (order 5(4)) and DOP853 (order 8(5,3)) are based on methods of 
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Dormand & Prince. They will be discussed in the following section. It can clearly 
be seen that higher order methods are, especially for higher precision, more ef¬ 
ficient than lower order methods. We shall also understand why the 5th order 
method of Dormand & Prince is clearly superior to RKF45. 


Exercises 

1. Show that Runge’s method (1.4) can be interpreted as two Euler steps (with 
step size hj 2), followed by a Richardson extrapolation. 

2. Prove that no 4-stage Runge-Kutta method of order 4 admits an embedded 
formula of order 3. 

Hint. Replace d- by bj — b- in the proof of Lemma 1.4 and deduce that b- = bj 
for all j , which is a contradiction. 

3. Show that the step size strategy (4.13) is invariant with respect to a rescaling 
of the independent variable. This means that it produces equivalent step size 
sequences when applied to the two problems 

y' = f(x,y), y(0) = y 0 , y(^end)= ? 

z' = cr- f(at,z), z(0)=y 0 , z(x end /a)=? 
with initial step sizes h 0 and h 0 /a , respectively. 

Remark. This is no longer the case if one replaces err in (4.13) by err/h and 
q by q — 1 (“error per unit step”). 
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Gehen wir endlich zu Naherungen von der fiinften Ordnung fiber, 
so werden die Verhaltnisse etwas andere. (W. Kutta 1901) 


This section describes the construction of Runge-Kutta methods of higher orders, 
particularly of orders p = 5 and p = 8. As can be seen from Table 2.3, the com¬ 
plexity and number of the order conditions to be solved increases rapidly with p . 
An increasingly skilful use of simplifying assumptions will be the main tool for 
this task. 


The Butcher Barriers 


For methods of order 5 there are 17 order conditions to be satisfied (see Table 2.2). 
If we choose s = 5 we have 15 free parameters. Already Kutta raised the ques¬ 
tion whether there might nevertheless exist a solution (“Nun ware es zwar moglich 
...”), but he had no hope for this and turned straight away to the case 5 = 6 (see 
II.2, Exercise 5). Kutta’s question remained open for more than 60 years and was 
answered around 1963 by three authors independently (Ceschino & Kuntzmann 
1963, p. 89, Shanks 1966, Butcher 1964b, 1965b). Butcher’s work is the farthest 
reaching and we shall mainly follow his ideas in the following: 

Theorem 5.1. For p > 5 no explicit Runge-Kutta method exists of order p with 
s —p stages. 

Proof We first treat the case s = p = 5: define the matrices U and V by 


Ei b i a i2 

Ei Mi3 

EiMi4 \ 


f c 2 

r 2 

c 2 

Ej a 2j c j- 

-4/ 2\ 

Ei Mi2C 2 

Ei Mi3 C 3 

Ei Mi 4 c 4 . 

v= 

C 3 

r 2 

c 3 


-c§/2 

92 

93 

9a / 


\ c 4 

4 

Ej a 4j c j~ 

-4/2/ 







(5.1) 


tO 

?r 

II 

M 

o- 


h a ik( 1 ~ c 

l k)- 


(5.2) 


i,j i 


where 
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Then the order conditions for order 5 imply 

/ 1/6 1/12 0 \ 
UV= 1/12 1/20 0 
\ 0 0 0 / 


(5.3) 


Lemma 1.5 gives g A = 0 and consequently c 4 = 1 as in Lemma 1.4. Next we put 
in (5.1) 


= (E ''i (l r >)) ( c i “ c s)• ( 5 - 4 ) 

i 


Again it can be verified by trivial computations that UV is the same as above. This 
time it follows that c 4 = c 5 , hence c 5 = 1. Consequently, the expression 

J2 b i^- C ^ a ij a jk C k ( 5 - 5 ) 


must be zero (because of 2 < k < j < i). However, by multiplying out and using 
two fifth-order conditions, the expression in (5.5) should be 1/120, a contradiction. 

The case p = s = 6 is treated by considering all “one-leg trees”, i.e., the trees 
which consist of one leg above the root and the 5 th order trees grafted on. The 
corresponding order conditions have the form 


E 6 


•a- (a ... expressions for order 5) = —-. 

j j -•• 'y(t) 


If we let bj = bi a ij we are back in the 5th order 5-stage business and can 
follow the above ideas again. However, the 7 (t) values are not the same as before; 
as a consequence, the product UV in (5.3) now becomes 


UV = 


( s - 2 )! 
2 ! 


(s-iy- 

\ 0 


2 ! 

(^ 1 )! 

3! 

s! 

0 


0 ^ 

0 

0 / 


(s = 6). 


(5.3’) 


Further, for p = s = 7 we use the “stork-trees” with order conditions 


E b i a ij a sk( a k... 


expressions for order 5) = 


and let 5/ = JE ■ b^-a^ and so on. The general case p = s > 5 is now clear. 

□ 
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6-Stage, 5 th Order Processes 

We now demonstrate the construction of 5 th order processes with 6 stages in full 
detail following the ideas which allowed Butcher (1964b) to construct 7-stage, 6 th 
order formulas. 

“In searching for such processes we are guided by the analysis of the previous 
section to make the following assumptions:” 


6 


J2 b i a ij 

i= 1 

= b j( 1 ~ C j) 

3 = 1, • • •, 6, 

(5.6) 

i-1 

r 2 



E a o c i 

j=i 

_ C z 

2 

i = 3,... ,6, 

(5.7) 

f>2 

= 0. 


(5.8) 


The advantage of condition (5.6) is known to us already from Section II. 1 (see 
Lemma 1.3): we can disregard all one-leg trees other than t 21 . 


Fig. 5.1. Use of simplifying assumptions 

Condition (5.7) together with (5.8) has a similar effect: for [[r], t 2 ,..., t m \ 
and [r, r, t 2 ,... ? t m \ of Fig. 5.1 (with identical but arbitrary subtrees t 2 , ... t m ) 
the order conditions read 

^2 bi a ij c j®i ~ ~ 2 an( ^ ^i c i ^ — ~ (5.9) 

i,j i 

with known values for <F- and r. Since b 2 = 0 by (5.8) it follows from (5.7) that 
both conditions of (5.9) are equivalent (the condition b 2 = 0 is necessary for this 
reduction, because (5.7) cannot be satisfied for i = 2; otherwise we would have 
c 2 = 0 and the method would be equivalent to one of fewer stages). 

The only trees left after the above reduction are the quadrature conditions 

6 1 

y>cr 1 = - 9 =1,2,3,4,5 (5.10) 

i =1 ^ 

and the two equations 

Z) 6 i c i a «°ife c fc=5^2, (5 ' U) 

ij 




(5.12) 
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We multiply (5.12) by 1/2 and then subtract both equations to obtain 

I] b i c i a H (Z a ik°k - S 2 / 2 ) = 0 . 

i,j k 

From (5.7) the parenthesis is zero except when j = 2, and therefore 

6 

J2 b i°i a i2 = 0 (5-13) 

i =3 

replaces (5.11). Our last simplification is to subtract other order conditions from 
(5.12) to obtain 

Z M 1 ~ c i) a ii c j( c i ~ c s) = w ~ 24* (5 ' 14) 

i,3 

which has fewer terms than before, in particular because c 6 = 1 by (5.6) with j = 6 . 
The resulting reduced system (5.6)-(5.8), (5.10), (5.13), (5.14) can easily be solved 
as follows: 

Algorithm 5.2 (construction of 6 -stage 5 th order Runge-Kutta methods). 

a) c 1 — 0 and c 6 = 1 from (5.6) with j = 6 ; c 2 , c 3 , c 4 , c 5 can be chosen as free 
parameters subject only to some trivial exceptions; 

b) b 2 — 0 from (5.8) and 6 1? 6 3 , 6 4 , fr 5 , b 6 from the linear system (5.10); 

c) a 32 from (5.7), i = 3; a 42 = A arbitrary; a 43 from (5.7), i — 4; 

d) a 52 and a 62 from the two linear equations (5.13) and (5.6), j — 2; 

e) a 54 from (5.14) and a 53 from (5.7), i = 5; 

f) a 63 , a 64 , a 65 from (5.6), j = 3,4, 5; 

g) finally a- x (i = 2,..., 6 ) from (1.9). 

Condition (5.6) for j = 1 and (5.7) for z = 6 are automatically satisfied. This 
follows as in the proof of Lemma 1.4. 

Embedded Formulas of Order 5 


Methods of Fehlberg. The methods obtained from Algorithm 5.2 do not all pos¬ 
sess an embedded formula of order 4. Fehlberg, interested in the construction of 
Runge-Kutta pairs of order 4(5), looked mainly for simplifying assumptions which 
depend only on c • and a -, but not on the weights b •. In this case the simplifying 
assumptions are useful for the embedded method too. Therefore Fehlberg (1969) 
considered (5.7), (5.8) and replaced (5.6) by 

izi c 3 

Z %- c2 = f’ 


i = 3,..., 6. 


(5.15) 
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As with (5.9) this allows us to disregard all trees of the form [[r, r ], t 2 ,..., t m ]. In 
order that the reduction process of Fig. 5.1 also work on a higher level, we suppose, 
in addition to b 2 = 0, that 

Yl b i a i2 =0 ’ Yl b i C i a i2 =0 ’ XA%' a j2 = 0 ’ (5 ' 16) 

i i i,j 

Then the last equations to be satisfied are 

<5 - i7) 

%3 


and the quadrature conditions (5.10). We remark that the equations (5.7) and (5.15) 
for i = 3 imply 



(5.18) 


We now want the method to possess an embedded formula of order 4. Analo¬ 
gously to (5.8) we set b 2 = 0. Then conditions (5.7) and (5.15) simplify the condi¬ 
tions of order 4 to 5 linear equations (the 4 quadrature conditions and JA 6-a- 2 = 0) 
for the 5 unknowns b 1 , 6 3 , 6 4 , 6 5 , b 6 . This system has a second solution (other than 
the b •) only if it is singular, which is the case if (see Exercise 1 below) 


__ 3c 2 _ 

14 ~ 4- 24c 2 + 45 4 ' 


(5.19) 


With c 2 , c 5 , c 6 as free parameters, the above system can be solved and yields an 
embedded formula of order 4(5). The coefficients of a very popular method, con¬ 
structed by Fehlberg (1969), are given in Table 5.1. 


Table 5.1. Fehlberg 4(5) 


0 







1 

1 






4 

4 






3 

3 

9 





8 

32 

32 





12 

1932 

7200 

7296 




13 

2197 

2197 

2197 




1 

439 

0 

3680 

845 



216 

— 0 

513 

4104 



1 

8 


3544 

1859 

11 


2 

_ 27 

z 

2565 

4104 

_ 40 


yi 

25 

0 

1408 

2197 

1 

0 

216 

2565 

4104 

_ 5 

m 

16 

0 

6656 

28561 

9 

2 

135 

12825 

56430 

_ 50 

55 
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All of the methods of Fehlberg are of the type p(jp) with p <p. Hence, the 
lower order approximation is intended to be used as initial value for the next step. In 
order to make his methods optimal, Fehlberg tried to minimize the error coefficients 
for the lower order result y 1 . This has the disadvantage that the local extrapolation 
mode (continue the integration with the higher order result) does not make sense 
and the estimated “error” can become substantially smaller than the true error. 

It is possible to do a lot better than the pair of Fehlberg currently 
regarded as “best.” (L.F. Shampine 1986) 


Table 5.2. Dormand-Prince 5(4) (DOPRI5) 
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l 







5 

5 
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3 

9 
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40 

40 






4 

44 

56 

32 





5 

45 

“15 

~9 





8 

19372 

25360 

64448 

212 




9 

6561 

2187 

6561 

_ 729 




1 

9017 

355 

46732 

49 

5103 



3168 

~33~ 

5247 

176 

18656 



1 

35 

0 

500 

125 

2187 

11 


384 

1113 

192 

6784 

84 


yi 

35 

0 

500 

125 

2187 

11 

0 

384 

1113 

192 

6784 

84 

yi 

5179 

0 

7571 

393 

92097 

187 

1 

57600 

16695 

640 

339200 

2100 

40 


Dormand & Prince pairs. The first efforts at minimizing the error coefficients of 
the higher order result , which is then used as numerical solution, were undertaken 
by Dormand & Prince (1980). Their methods of order 5 are constructed with the 
help of Algorithm 5.2 under the additional hypothesis (5.15). This condition is 
achieved by fixing the parameters c 3 and a 42 in such a way that (5.15) holds for 
i — 3 and i = 4. The remaining two relations (i = 5, 6) are then automatically 
satisfied. To see this, multiply the difference e- = a ij c< j ~ c i/^ by &i an d 
b i c i , respectively, sum up and deduce that all e i must vanish. 

In order to equip the method with an embedded formula, Dormand & Prince 
propose to use the FSAL idea (i.e., add y x as 7th stage). In this way the restriction 
(5.19) for c 4 is no longer necessary. We fix arbitrarily b 7 ^ 0, put b 2 = 0 (as in 
(5.8)) and compute the remaining b i9 as above for the Fehlberg case from the 4 
quadrature conditions and from JA 6 -a - 2 = 0 . 

We have thus obtained a family of 5 th order Runge-Kutta methods with 4 th 
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order embedded solution with c 2 ,c 4 ,c 5 as free parameters. Dormand & Prince 
(1980) have undertaken an extensive search to determine these parameters in or¬ 
der to minimize the error coefficients for y 1 and found that c 2 = 1/5, c 4 = 4/5 
and c 5 = 8/9 was a close rational approximation to an optimal choice. Table 5.2 
presents the coefficients of this method. The corresponding code of the Appendix 
is called DOPRI5. 


Higher Order Processes 

Order 6. By Theorem 5.1 at least 7 stages are necessary for order 6. A. Huta 
(1956) constructed 6th order processes with 8 stages. Finally, methods with 5 = 7, 
the optimal number, were derived by Butcher (1964b) along similar lines as above. 
He arrived at an algorithm where c 2 , c 3 , c 5 , c 6 are free parameters. 

Order 7. The existence of such a method with 8 stages is impossible by the fol¬ 
lowing barrier: 

Theorem 5.3 (Butcher 1965b). For p > 7 no explicit Runge-Kutta method exists of 
order p with s =p + 1 stages. 

Since the proof of this theorem is much more complicated than that of Theo¬ 
rem 5.1, we do not reproduce it here. 

This raises the question, whether 7 th order methods with 9 stages exist. Such 
methods, announced by Butcher (1965b), do exist; see Verner (1971). 

Order 8. As to methods of order 8, Curtis (1970) and Cooper & Verner (1972) have 
constructed such processes with 5 = 11. It was for a long time an open question 
whether there exist methods with 10 stages. John Butcher’s dream of settling this 
difficult question before his 50 th birthday did not become true. But he finally 
succeeded in proving the non-existence for Dahlquist’s 60th birthday: 

Theorem 5.4 (Butcher 1985b). For p>8 no explicit Runge-Kutta method exists of 
order p with s =p + 2 stages. 

For the proof, which is still more complicated, we again refer to Butcher’s 
original paper. 

Order 10. These are the highest order explicitly constructed explicit Runge-Kutta 
methods. Curtis (1975) constructed an 18-stage method of order 10. His con¬ 
struction was based solely on simplifying assumptions of the type (5.7), (5.8) and 
their extensions. Hairer (1978) then constructed a 17-stage method by using the 
complete arsenal of simplifying ideas. For more details, see the first edition, p. 189. 
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Embedded Formulas of High Order 


It was mainly the formula manipulation genius Fehlberg who first derived high 
order embedded formulas. His greatest success was his 7th order formula with 8th 
order error estimate (Fehlberg 1968) which is of frequent use in all high precision 
computations, e.g., in astronomy. The coefficients are reproduced in Table 5.3. 



Fehlberg’s methods suffer from the fact that they give identically zero error es¬ 
timates for quadrature problems y' = f(x ). The first high order embedded formu¬ 
las which avoid this drawback were constructed by Verner (1978). One of Verner’s 
methods (see Table 5.4) has been implemented by T.E. Hull, W.H. Enright and 
K.R. Jackson as DVERK and is widely used. 
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Table 5.4. 

Verner’s method of order 6(5) (DVERK) 
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72 
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13 

0 

2375 

5 

12 

3 

0 

0 

y i 

160 

5984 

16 

85 

44 


An 8 th Order Embedded Method 


The first high order methods with small error constants of the higher order solution 
were constructed by Prince & Dormand (1981, Code DOPRI8 of the first edition). 
In the following we describe the construction of a new Dormand & Prince pair of 
order 8(6) which will also allow a cheap and accurate dense output (see Section 
II.6). This method has been announced, but not published, in Dormand & Prince 
(1989, p. 983). We are grateful to P. Prince for mailing us the coefficients and for 
his help in recovering their construction. 

The essential difficulty for the construction of a high order Runge-Kutta me¬ 
thod is to set up a “good” reduced system which implies ah order conditions of 
Theorem 2.13. At the same time it should be simple enough to be easily solved. 
In extending the ideas for the construction of a 5 th order process (see above), 
Dormand & Prince proceed as follows: 

Reduced system. Suppose 5 = 12 and consider for the coefficients c- , 5- and a- tJ 
the equations: 


J2Ui b i c i 1 = 1 /<7> g=l,...,8 

(5.20a) 

1 a ij C ii ^ 1j • • • 5 s 

(5.20b) 

Ej = l a ij c j = c i/ 2 ! i = 3,...,s 

(5.20c) 
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E5=iOy^ = ^A i = 3,...,s (5.20d) 

* = 6,..., s (5.20e) 

E‘=i%-c4=cf/5, i = 6 ,..., s (5.20f) 

62 = 63 = 64 = 65 = 0 (5.20g) 

a - 2 = 0 for i > 4, a - 3 = 0 for i > 6 (5.20h) 

E-= i+ iMii=^(l-S)’ 3 = 4,5, 10 , 11,12 (5.20i) 

Ei= i+ i = 0, J=4,5 (5.20j) 

EEi+1 b i c i a ij = 0, 3= 4,5 (5.20k) 

EUh 2 6 * c i EJ=jb +1 = °> k = 4 > 5 ( 5 - 201 ) 

Ei=i V* Ei=i %-c® = 1/48. (5.20m) 


Verification of the order conditions. The equations (5.20a) are the order condi¬ 
tions for the bushy trees [r,..., r] and (5.20m) is that for the tree [r, [r, r, r, r, r]]. 
For the verification of further order conditions we shall show that the reduced sys¬ 
tem implies 

52 b i a ij = b A l ~ c j) for a11 3 ■ (5-21) 

i=j+ 1 

If we denote the difference by d- — Ylt=j+i ^i a ij — (1 — c j) ^ en ^2 = ^3 = 0 by 

(5.20g,h) and d 4 = d 5 = d 10 = d n = d 12 = 0 by (5.20i). The conditions (5.20a-g) 
imply 

s 

J2 d j c j~ 1= 0 for 9 =!,•••, 5. (5.22) 

3 =1 

Hence, the remaining 5 values must also vanish if c-l, c 6 , c 7 , c 8 , c 9 are distinct. 
The significance of condition (5.21) is already known from Lemma 1.3 and from 
formula (5.6). It implies that all one-leg trees t= [t x ] can be disregarded. 


Fig. 5.2. Use of simplifying assumptions 

Conditions (5.20c-f) are an extension of (5.6) and (5.15). Their importance 
will be, once more, demonstrated on an example. Consider the two trees of Fig. 5.2 
and suppose that their encircled parts are identical. Then the corresponding order 
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conditions are 


E 0 

^,j=l 


* a *i c i r . 5 . 4 


and 


8 i 

VW = -E 

' r • 5 


i—l 


with known values for 0- and r . If (5.20e) is satisfied and if 

®2 = = = = 0 


(5.23) 


(5.24) 


then both conditions are equivalent so that the left-hand tree can be neglected. The 
conditions (5.20g,i-l) correspond to (5.24) for certain trees. Finally the assumption 
(5.20h) together with (5.20g,i-k) implies that for arbitrary <!>•, 4U and for q G 
{1,2,3}, 

= 0 

J2i,j b i®i a ij^j a j 2 = ° and 


■ b ^ la ij^j a jk^k a k2 = 0 


Ei = o 

Y,i, j b i c q - 1 a ij $ j a j3 =0 


u i L 

which are again conditions of type (5.24). Using these relations the verification 
of the order conditions (order 8 ) is straightforward; all trees are reduced to those 
corresponding to (5.20a) and (5.20m). 


Solving the reduced system. Compared to the original 200 order conditions of 
Theorem 2.13 for the 78 coefficients b i ^a i - (the c- are defined by (5.20b)), the 
74 conditions of the reduced system present a considerable simplification. We can 
hope for a solution with 4 degrees of freedom. 

We start by expressing the coefficients b i , a- in terms of the c i . Because 
of (5.20g), condition (5.20a) represents a linear system for b 1: fr 6 ,..., 6 12 , which 


has a unique solution if c x , c 6 ,. 


"12 


are distinct. For a fixed i (1 < i < 8 ) 


conditions (5.20b-f) represent a linear system for a il: ..., a i i _ 1 . Since there are 
sometimes less unknowns than equations (mainly due to (5.20h)) restrictions have 
to be imposed on the c-. One verifies (similarly to (5.18)) that the relations 


Cl =0, 

6 - 




Co = - C, 


Vq 


10 




6 + V6 

0 “ 10 c ‘ 


'4 5 


6 5 


(5.25a) 


Ca — _ Cn 


allow the computation of the a- with i < 8 (Step 1 in Fig. 5.3). 

If b 12 7 ^ 0 (which will be assumed in our construction), condition (5.20i) for 
j — 12 implies 


"12 


Si, 


(5.25b) 


and for j = 11 it yields the value for a 12 n . We next compute the expressions 


i = E h i c i a n- 

i=j +1 


(1-C-), j = 1,. 


.,s. 


(5.26) 
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i = 2 
i = 3 
i = 4 
i = 5 
i = 6 
i = 7 
i = 8 
i = 9 
i = 10 
i=ll 
/= 12 
i = 13 
/= 14 
i= 15 
i = 16 


1 













1 

1 












1 

0 

1 











1 

0 

1 

1 










1 

0 

0 

1 

1 









1 

0 

0 

1 

1 

1 








1 

0 

0 

1 

1 

1 

1 







4 

0 

0 

3 

3 

4 

4 

4 






4 

0 

0 

3 

3 

4 

4 

4 

4 





4 

0 

0 

3 

3 

4 

4 

4 

4 

2 




4 

0 

0 

3 

3 

4 

4 

4 

4 

2 

2 



1 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

L_ 

5 

0 

0 

0 

0 

0 

5 

5 

5 

5 

5 

5 

5 

L 

5 

0 

0 

0 

0 

5 

5 

5 

0 

0 

5 

5 

5 

5 

5 

0 

0 

0 

0 

5 

5 

5 

5 

0 

0 

0 

5 

5 


5 


Fig. 5.3. Steps in the construction of an 8 th order RK method; 

the entries 0 indicate vanishing coefficients; 
the stages i= 14,15,16 will be used for dense output, see II.6. 


We have e 12 = 0 by (5.25b), e n = b 12 a 12 n — fr n (l — cf 1 )/2 is known and e 2 = 
e 3 = e 4 = e 5 = 0 by (5.20g,h,j). The remaining 6 values are determined by the 
system 

s 

= "’ 9 = !,•••, 6 (5.27) 

3 = 1 

which follows from (5.20a-f,m). The conditions (5.20i) and (5.26) for j = 10 then 
yield a 12 10 and a n 10 (Step 2 in Fig. 5.3). 

We next compute a- (i = 9,10,ll,12;j = 4,5) from the remaining 8 equa¬ 
tions of (5.20i-l). This is indicated as Step 3 in Fig. 5.3. Finally, we use the 
conditions (5.20b-f) with i > 9 for the computation of the remaining coefficients 
(Step 4). A difficulty still arises from the case i = 9, where only 4 parameters for 
five equations are at our disposal. A tedious computation shows that this system 
has a solution if (see Exercise 6 below) 

^ 3(7]^—28cr 2 +189(j3+14cr 1 (J 2 —I68cr 1 cr 3 +98(J 2 (J3 

9 6—21<j 1 +35<J 2 —4:2<j 3 +21<jj+98(T 2 +735(T3—84<j 1 <j 2 +168(J 1 (J 3 —490<J 2 (T 3 

(5.25c) 

where 

a l ~ C 6 + C 7 + c 8’ ^2 ~ C 6 C 7 + C 6 C 8 + C 7 C 8’ a 3 = C 6 C 7 C 8* (5.28) 

The reduced system (5.20) leaves c 7 , c 8 , c 10 , c xl as free parameters. Dormand 
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& Prince propose the following numerical values: 

c 7 = 1/4, C 8 = 4/13, c 10 = 3/5, c n = 6/7 . 

All remaining coefficients are then determined by the above procedure. Since c 4 
and c 5 (see (5.25a)) are not rational, there is no easy way to present the coefficients 
in a tableau. 

Embedded method. We look for a second method with the same c-, a- • but with 
different weights, say 5-. If we require that 

£<=i !/9> 9 = !>.••• >6 (5.29a) 

£>2 = = 64 = 65 = 0 (5.29b) 

EUiM^O, i = 4,5 (5.29c) 

then one can verify (similarly as above for the 8 th order method) that the corre¬ 
sponding Runge-Kutta method is of order 6 . The system (5.29) consists of 12 lin¬ 
ear equations for 12 unknowns. A comparison with (5.20) shows that b 1: ..., b 12 
is a solution of (5.29). Furthermore, the corresponding homogeneous system has 
the nontrivial solution e 1? ..., e 12 (see (5.27) and (5.201)). Therefore 

b i = b i +ae i (5.30) 

is a solution of (5.29) for all values of a. Dormand & Prince suggest taking a in 
such a way that b 6 = 2 . 

A program based on this method (with a different error estimator, see Section 
II. 10) has been written and is called DOP853. It is documented in the Appendix. 
The performance of this code, compared to methods of lower order, is impressive. 
See for example the results for the Brusselator in Fig. 4.2. 


Exercises 

1. Consider a Runge-Kutta method with s stages that satisfies (5.7)-(5.8), (5.15), 
(5.17) and the first two relations of (5.16). 

a) If the relation (5.19) holds, then the method possesses an embedded for¬ 
mula of order 4. 

b) The condition (5.19) implies that the last relation of (5.16) is automatically 
satisfied. 

Hint. The order conditions for the embedded method constitute a linear system 
for the b i which has to be singular. This implies that 

a i 2 = ac i + /5cf + 7 4 for i + 2 . 


(5.31) 
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Multiplying (5.31) with b i and 6 -c- and summing up, yields two relations for 
a, /?, J 7 . These together with (5.31) for i = 3,4 yield (5.19). 

2. Construct a 6 -stage 5 th order formula with c 3 = 1/3, c 4 = 1/2, c 5 = 2/3 
possessing an embedded formula of order 4. 

3. (Butcher). Show that for any Runge-Kutta method of order 5, 

£‘.(£%s-f) 2 = a 

* 3 

Consequently, there exists no explicit Runge-Kutta method of order 5 with all 
bi> 0. 

Hint. Multiply out and use order conditions. 

4. Write a code with a high order Runge-Kutta method (or take one) and solve 
numerically the Arenstorf orbit of the restricted three body problem (0.1) (see 
the introduction) with initial values 

2/i (0) = 0.994, 2/i(0) = 0, 2 / 2 W = 0, 

2 / 2 ( 0 ) = -2.0317326295573368357302057924, 

Compute the solutions for 

x end = 11.124340337266085134999734047. 

The initial values are chosen such that the solution is periodic to this precision. 
The plotted solution curve has one loop less than that of the introduction. 

5. (Shampine 1979). Show that the storage requirement of a Runge-Kutta method 
can be substantially decreased if s is large. 

Hint. Suppose, for example, that s = 15. 

After computing (see (1.8)) Aq, fc 2 , • • •, k 9 , compute the sums 

9 99 

for i = 10,11,12,13,14,15, y^b-k-] 

j= 1 3 =1 3 = 1 

then the memories occupied by k 2 , k 3 ,..., k 9 are not needed any longer. An¬ 
other possibility for reducing the memory requirement is offered by the zero- 
pattern of the coefficients. 

6 . Show that the reduced system (5.20) implies (5.25c). 

Hint. The equations (5.20b-f) imply that for i G {1, 6 , 7, 8 , 9} 

C 2 c 3 c 4 c 5 6 

aa i4 + (3a i5 = <j 3 -j- - a 2 + o 1 -j- - -j- 


(5.32) 
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with a ■ given by (5.28). The constants a and j3 are not important. Further, 
for the same values of i one has 

0 = c A c i ~ c e)( c i ~ c r)( c i ~ c 8)( c i ~ c o) ( 5 - 33 ) 

= cr 3 c 9 c- - (cr 3 + c 9 cr 2 )cf + (a 2 + c g cr 1 )c? - (a 1 + c 9 )cf + cf. 

Multiplying (5.32) and (5.33) by e-, 6 -, 6-c-, 6-c?, summing up from i = 1 to 
s and using (5.20) gives the relation 


x 

X 

X > 

\ +10 

^10 

o 

6" 1 

o 

-o 

o 

o 

x 

X 

X 


b ll 

^ll c ll 

b r 2 
+ c ll 

v° 

0 


1 { 0 

b 12 

b 12 

b 12 

where 












*2 ^ 




7 2 7s 


(5.34) 


2 • G? + 2) 

3 C 9 


+ 


3*0 + 3) 4-(j + 4) 5-0 + 5) 


r a 3 C < 


a 3 + c g <J 2 ^ a 2 + <+ 0 ^ 


cr * + c, 


+ ■ 


1 


7 + 2 ' j + 3 0 + 4 ' j + 5 

and the “ x ” indicate certain values. Deduce from (5.34) and e n ^ 0 that the 
most left matrix of (5.34) is singular. This implies that the right-hand matrix 
of (5.34) is of rank 2 and yields equation (5.25c). 


7. Prove that the 8 th order method given by (5.20; 5 = 12) does not possess a 6 th 
order embedding with b 12 ^ b 12 , not even if one adds the numerical result y 1 
as 13 th stage (FSAL). 
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... providing “interpolation” for Runge-Kutta methods. ... this 
capability and the features it makes possible will be the hallmark 
of the next generation of Runge-Kutta codes. 

(L.F. Shampine 1986) 


The present section is mainly devoted to the construction of dense output formulas 
for Runge-Kutta methods. This is important for many practical questions such as 
graphical output, event location or the treatment of discontinuities in differential 
equations. Further, the numerical computation of derivatives with respect to initial 
values and parameters is discussed, which is particularly useful for the integration 
of boundary value problems. 


Dense Output 

Classical Runge-Kutta methods are inefficient, if the number of output points be¬ 
comes very large (Shampine, Watts & Davenport 1976). This motivated the con¬ 
struction of dense output formulas (Horn 1983). These are Runge-Kutta methods 
which provide, in addition to the numerical result y 1 , cheap numerical approxima¬ 
tions to y(x 0 + Oh) for the whole integration interval 0 < 6 < 1. “Cheap” means 
without or, at most, with only a few additional function evaluations. 

We start from an s -stage Runge-Kutta method with given coefficients c-, a- 
and bj , eventually add s* — s new stages, and consider formulas of the form 

s* 

u(0) =Vo + h Yl b i(^ k i’ C 6 - 1 ) 

where 

i -1 

K = f( x o + c i h ,yo + h '52 a ij k j)’ i = l r ,..,s* (6.2) 

3 = 1 

and &•(#) are polynomials to be determined such that 

u{6) — y(x 0 + Oh) = 0(h p +1 ). (6.3) 

Usually 5* > s + 1 since we include (at least) the first function evaluation of the 
subsequent step fc s+1 = hf(x Q + h,y 1 ) in the formula with & s +i,j = ^ j f° r 3 • 
A Runge-Kutta method, provided with a formula (6.1), will be called a continuous 
Runge-Kutta method. 
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Theorem 6.1. The error of the approximation (6.1) is of order p* (i.e., the local 
error satisfies (6.3)), if and only if 

qq6) 

f° r Q(t)<p* ( 6 . 4 ) 

with g(t), 7 (t) given in Section II.2. 

Proof. The gth derivative (with respect to h) of the numerical approximation is 
given by (2.14) with b- replaced by b-(0) ; that of the exact solution y(x 0 + Oh) is 
O q y {q \x 0 ) . The statement thus follows as in Theorem 2.13. □ 


Corollary 6.2. Condition (6.4) implies that the derivatives of (6.1) approximate 
the derivatives of the exact solution as 

h~ k u^ k \e) - y (k \x 0 + Oh) = 0(h p *- k+1 ). (6.5) 

Proof. Comparing the gth derivative (with respect to h) of u'(0) with that of 
hy'(x 0 + Oh) we find that (6.5) (for k = 1 ) is equivalent to 

t W**® = ^77— for ^ P*- 

This, however, follows from (6.4) by differentiation. The case k > 1 is obtained 
similarly. □ 


We write the polynomials b-(0) as 


m «)=b/. 


q =1 


( 6 . 6 ) 


so that the equations (6.4) become a system of simultaneous linear equations of the 
form 


/ i i .. i 

Tqfel) 3>2(f2l) •• ®s*(t 2 l) 
^l(f3l) ^2(^31) •• ^s*(f3l) 

V ; ; ; 


<r 


/ bu bi2 bis 

b21 b22 b2S 


\b s *i b , 


's* 2 b s * 3 . 


B 


/I 0 0 .. 

‘ 0 i 0 .. 
0 0 2 .. 
oof.. 


V: 


G 


(6.4’) 


where the &j(t) are known numbers depending on a- and c i . Using standard 
linear algebra the solution of this system can easily be discussed. It may happen, 
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however, that the order p* of the dense output is smaller than the order p of the 
underlying method. 


Example. For “the” Runge-Kutta method of Table 1.2 (with s* = s = 4) equations 
(6.4’) with p* = 3 produce a unique solution 


b 1 (0)=0 


3 e 2 

~Y 


2 ( 9 3 

IT ’ 


b 2 mm b3 (o) = o 2 


2 (P 

~Y ’ 


64W 


02 

T 


2 6> 3 

IT 


which constitutes a dense output solution which is globally continuous but not C 1 . 


Hermite interpolation. A much easier way (than solving (6.4’)) and more efficient 
for low order dense output formulas is the use of Hermite interpolation (Shampine 
1985). Whatever the method is, we have two function values y 0 , y 1 and two 
derivatives / 0 = f(x 0 ,y 0 ), f x = f(x 0 -\-h, y x ) at our disposal and can thus do 
cubic polynomial interpolation. The resulting formula is 

u(9) = (1 - 6)y 0 + 9 Vl +9(9- 1) ((1 -2 9)( Vl -y 0 ) + (9 -1 )hf 0 +9hf 1 ). (6.7) 

Inserting the definition of y 1 into (6.7) shows that Hermite interpolation is a special 
case of (6.1). Whenever the underlying method is of order p > 3 we thus obtain a 
continuous Runge-Kutta method of order 3. 

Since the function and derivative values on the right side of the first interval 
coincide with those on the left side of the second interval, Hermite interpolation 
leads to a globally C 1 approximation of the solution. 


The 4-stage 4th order methods of Section II. 1 do not possess a dense output 
of order 4 without any additional function evaluations (see Exercise 1). Therefore 
the question arises whether it is really important to have a dense output of the same 
order. Let us consider an interval far away from the initial value, say [x n , x n+1 \, 
and denote by z{x) the local solution, i.e., the solution of the differential equation 
which passes through (x n , y n ). Then the error of the dense output is composed of 
two terms: 

u (9) ~ y(x n + 9h) = ( u(9 ) - z{x n + 9h )) + (z{x n + 9h) - y(x n + 9h )). 

The term to the far right reflects the global error of the method and is of size 0(h p ). 
In order that both terms be of the same order of magnitude it is thus sufficient to 
require p* = p — 1. 

The situation changes, if we also need accurate values of the derivative y'(x n + 
Oh) (see Section 5 of Enright, Jackson, Nprsett & Thomsen (1986) for a discussion 
of problems where this is important). We have 

h~ 1 u\9) - y\x n + 9h) = ( h~ 1 u\9 ) - z'(x n + 9h )) + (z'(x n + 9h) - y\x n + 9h )) 

and the term to the far right is of size 0(h p ) if /(x, y) satisfies a Lipschitz condi¬ 
tion. A comparison with (6.5) shows that we need p* = p in order that both error 
terms be of comparable size. 
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Boot-strapping process (Enright, Jackson, N0rsett & Thomsen 1986). This is a 
general procedure for increasing iteratively the order of dense output formulas. 

Suppose that we already have a 3rd order dense output at our disposal (e.g., 
from Hermite interpolation). We then fix arbitrarily an a G (0,1) and denote the 
3rd order approximation at x 0 + ah by y a . The idea is now that hf(x 0 + ah, y a ) 
is a 4 th order approximation to hy'(x 0 + ah) . Consequently, the 4 th degree poly¬ 
nomial u{6) defined by 

u (0) = Vq, u'(0) = hf(x 0 ,y 0 ) 

u(l)=y 1 , u'(l) =hf(x 0 + h,y 1 ) (6.8) 

u\a) = hf(x 0 + ah, y a ) 

(which exists uniquely for a ^ 1 /2) yields the desired formula. The interpolation 
error is 0(h b ) and each quantity of (6.8) approximates the corresponding exact 
solution value with an error of 0(h 5 ). 

The extension to arbitrary order is straightforward. Suppose that a dense output 
formula u 0 (6) of order p* < p is known. We then evaluate this polynomial at 
p* — 2 distinct points a i G (0,1) and compute the values f(x 0 + it 0 (o;-)). 

The interpolation polynomial u-^fO) of degree p* + 1, defined by 

M i(° ) = %, u' 1 (0) = hf(x o ,y o ) 

u i( 1 )=Vi, u' 1 (l) = hf(x 0 + h,y 1 ) (6.9) 

u i( a i) = hf(x Q +a i h, w 0 (aj)), i = 1,.. .p* - 2, 

yields an interpolation formula of order p* + 1. Obviously, the a i in (6.9) have to 
be chosen such that the corresponding interpolation problem admits a solution. 


Continuous Dormand & Prince Pairs 


The method of Dormand & Prince (Table 5.2) is of order 5(4) so that we are mainly 
interested in dense output formulas with p* = 4 and p* = 5. 

Order 4. A continuous formula of order 4 can be obtained without any additional 
function evaluation. Since the coefficients satisfy (5.7), it follows from the dif¬ 
ference of the order conditions for the trees t 31 and t 32 (notation of Table 2.2) 
that 

b 2 (0) = 0 (6.10) 

is necessary. This condition together with (5.7) and (5.15) then implies that the 
order conditions are equivalent for the following pairs of trees: t 31 and f 32 , t 41 
and t 42 , t 41 and t 43 . Hence, for order 4, only 5 conditions have to be considered 
(the four quadrature conditions and JA b i (6)a i2 = 0). We can arbitrarily choose 
b 7 (0) and the coefficients b 1 (0), & 3 (0),..., b 6 (0) are then uniquely determined. 
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As for the choice of b 7 {0 ), Shampine (1986) proposed minimizing, for each 6 , the 
error coefficients (Theorem 3.2) 

7 

e(t)=e 5 - 1 (t)J2b j m j (t) for t € T s , (6.11) 

3 = 1 

weighted by a(t) of Definition 2.5, in the square norm. These expressions can be 
seen to depend linearly on b 7 (0 ), 

a(t)e(t) = C (t, 0) - b 7 (0)ri(t), 

thus the minimal value is found for 

h i( e ) = 52 £(*> 6 ') ? ?( i ) / 52 v 2 ^)- 

tGT 5 tGT 5 

The resulting formula, given by Dormand & Prince (1986), is 

& 7 (0) = 9 2 (d - 1) + e 2 (0 - 1) 2 10 • (7414447 - 829305<9)/29380423. (6.12) 

The other coefficients, written in a fashion which makes the Hermite-part clearly 
visible, are then given by 

b\(9) = 9 2 (3 - 29) ■ bi +9(9 - l) 2 

- 9 2 (9 - 1) 2 5 • (2558722523 - 3140301669/11282082432 
b 3 (9) = 0 2 (3 - 29) ■ b 3 + 9 2 (9 - 1) 2 100 • (882725551 - 1570150869/32700410799 
b 4 (9) = 9 2 (3 - 29) ■ b 4 - 6 2 (9 - 1) 2 25 • (443332067 - 314O3O160)/188O347O72 
br,(9) = 9 2 (3 - 29) ■ b 5 +9 2 (9 - 1) 2 32805 • (23143187 - 348922469/199316789632 
b 6 (9) = 9 2 (3 - 29) ■ be - 9 2 (9 - 1) 2 55 • (29972135 - 7O767360)/822651844. 

(6.13) 

It can be directly verified that the interpolation polynomial u(6) defined by (6.10), 
(6.12) and (6.13) satisfies 

u(0) = y 0 , u'(0) = hf(x o ,y o ), 

(6.14) 

u(l)=Vi, u(l) = hf(x 0 + h,y 1 ), 

so that it produces globally a C 1 approximation of the solution. 

Instead of using the above 5 th degree polynomial u(0 ), Shampine (1986) sug¬ 
gests evaluating it only at the midpoint, y X j 2 = u( 1/2), and then doing quartic 
polynomial interpolation with the five values y 0 , hf(x 0 , y 0 ), y x , hf(x 0 + ft, y x ), 
y 1 / 2 - This dense output is also C 1 , is easier to implement and the difference to the 
above formula “... is not significant” (Dormand & Prince 1986). 

We have implemented Shampine’s dense output in the code DOPRI5 (see Ap¬ 
pendix). The advantages of such a dense output for graphical representations of the 
solution can already be seen from Fig. 0.1 of the introduction to Chapter II. For a 
more thorough study we have applied DOPRI5 to the Brusselator (4.15) with initial 
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values rq(0) = 1.5, y 2 (0) =3, integration interval 0 < x < 10 and error tolerance 
Atol = Rtol = 10 -4 . The global error of the above 4 th order continuous solution 
is displayed in Fig. 6.1 for both components. The error shows the same quality 
throughout; the grid points, which are represented by the symbols □ and O , are 
by no means outstanding. 



Order 5. For a dense output of order p* = 5 for the Dormand & Prince method 
the linear system (6.4’) has no solution since 

rank (<F|G) = 9 and rank (4>) = 7 (6.15) 

as can be verified by Gaussian elimination. Such a linear system has a solution 
if and only if the two ranks in (6.15) are equal . So we must append additional 
stages to the method. Each new stage adds a new column to the matrix 4>, thus 
may increase the rank of by one without changing rank (4>|G). Therefore we 
obtain 

Lemma 6.3 (Owren & Zennaro 1991). Consider a Runge-Kutta method of order 
p. For the construction of a continuous extension of order p* = p one has to add 
at least 

5 :=rank (<T>|G) — rank (4>) (6.16) 

stages. □ 


For the Dormand & Prince method we thus need at least two additional stages. 

There are several possibilities for constructing such dense output formulas: 

a) Shampine (1986) shows that one new function evaluation allows one to com¬ 
pute a 5 th order approximation at the midpoint x 0 + h/ 2. If one evaluates 
anew the function at this point to get an approximation of y'(x 0 + h/ 2), one 
can do quintic Hermite interpolation to get a dense output of order 5. 
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b) Use the 4 th order formula constructed above at two different output points and 
do boot-strapping. This has been done by Calve & Vaillancourt (1990). 

c) Add two arbitrary new stages and solve the order conditions. This leads to 
methods with 10 free parameters (Calvo, Montijano & Randez 1992) which 
can then be used to minimize the error terms. This seems to give the best 
output formulas. 

New methods. If anyhow the Dormand & Prince pair needs two additional func¬ 
tion evaluations for a 5 th order dense output, the suggestion lies at hand to search 
for completely new methods which use all stages for the solution y 1 and y x as 
well. Owren & Zennaro (1992) constructed an 8-stage continuous Runge-Kutta 
method of order 5(4). It uses the FSAL idea so that the effective cost is 7 function 
evaluations (fe) per step. Bogacki & Shampine (1989) present a 7-stage method 
of order 5(4) with very small error coefficients, so that it nearly behaves like a 6 th 
order method. The effective cost of its dense output is 10 fe. A method of order 
6(5) with a dense output of order p* = 5 is given by Calvo, Montijano & Randez 
(1990). 

Dense Output for DOP853 

We are interested in a continuous extension of the 8 th order method of Section 
II.5 (formula (5.20)). A dense output of order 6 can be obtained for free (add y x 
as 13 th stage and solve the linear system (6.19a-c) below with s*=s + 1 = 13). 
Following Dormand & Prince we shall construct a dense output of order p* = 7. 
We add three further stages (by Lemma 6.3 this is the minimal number of additional 
stages). The values for c 14 , c 15 , c 16 are chosen arbitrarily as 

Ci 4 = 0.1, c 15 = 0.2, c 16 = 7/9 (6.17) 

and the coefficients a ( / are assumed to satisfy, for i e {14,15,16}, 

= c i/<L q= 1,...,6 (6.18a) 

a i2 = a i3 = a i4 = a i5 = ° ( 6 - 18b > 

Ei=fc+1 a H a jk = 0> A = 4,5. (6.18c) 

This system can easily be solved (step 5 of Fig. 5.3). We are still free to set some 
coefficients equal to 0 (see Fig. 5.3). 

We next search for polynomials 6-($) such that the conditions (6.4) are satis¬ 
fied for all trees of order < 7. We find the following necessary conditions (s* = 16) 

ECr b^eyr^eo/q, q i.7 

b 2 {0) = h(e) = b 4 (0) = h(0) = o 

Ei= i+ 1 = 0, 3 = 4 ,5 


(6.19a) 

(6.19b) 

(6.19c) 
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Eilj +1 b i(°) c i a ij = 3 = 4,5 (6.19d) 

E£=iM*K^=* 742. (6.19e) 

Here (6.19a,e) are order conditions for [r,..., r] and [[r, r, r, r, r]]. The property 
b 2 (0) = 0 follows from 0 = Ei a ij c j ^ c ?/ 2 ) = -b 2 ( e ) c l/ 2 

and the other three conditions of (6.19b) are a consequence of the relations 0 = 
J2i bi(0)cf _1 a-^c;- — cj /4) =0 for q = 1, 2, 3. The necessity of the condi¬ 

tions (6.19c,d) is seen similarly. 

On the other hand, the conditions (6.19) are also sufficient for the dense output 
to be of order 7. We first remark that (6.19), (6.18) and (5.20) imply 

s* 

E b0X jajk = O, k = 4,5 (6.20) 

i,j=k +1 

(see Exercise 3). The verification of the order conditions (6.4) is then possible 
without difficulty. 

System (6.19) consists of 16 linear equations for 16 unknowns which possess 
a unique solution. An interesting property of the continuous solution (6.1) obtained 
in this manner is that it yields a global C 1 -approximation to the solution, i.e., 

w (°) = %> u(l)=y 1: w'( 0 ) = hf(y 0 ), u'(l ) = hf(y 1 ). ( 6 . 21 ) 

For the verification of this property we define a polynomial q(6) of degree 7 by the 
relations (6.21) and by q(0 •) = n(#-) for 4 distinct values 0 i which are different 
from 0 and 1. Obviously, q(0) is of the form (6.1) and defines a dense output of 
order 7. Due to the uniqueness of the 6-(0) we must have q(0) = u{6) so that 
( 6 . 21 ) is verified. 

Event Location 

Often the output value x end for which the solutions are wanted is not known in 
advance, but depends implicitly on the computed solutions. An example of such a 
situation is the search for periodic solutions and limit cycles discussed in Section 
1.16, where we wanted to know when the solution reaches the Poincare-section for 
the first time. 

Such problems are very easily treated when a dense output u(x) is available. 
Suppose we want to determine x such that 

g{x,y(x))= 0. (6.22) 


Algorithm 6.4. Compute the solution step-by-step until a sign change appears be¬ 
tween g(x •, t/-) and g(x i+1 ,y i+1 ) (this is, however, not completely safe because 
g may change sign twice in an integration interval; use the dense output at in¬ 
termediate values if more safety is needed). Then replace y(x) in (6.22) by the 
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approximation u{x) and solve the resulting equation numerically, e.g. by bisection 
or Newton iterations. 

This algorithm can be conveniently done in the subroutine SOLOUT, which is 
called after every accepted step (see Appendix). If the value of x , satisfying (6.22), 
has been found, the integration is stopped by setting IRTRN = — 1. 

Whenever the function g of (6.22) also depends on y' (x), it is advisable to use 
a dense output of order p* = p . 


Discontinuous Equations 

If you write some software which is half-way useful, sooner or 
later someone will use it on discontinuities. You have to scope 
about ... (A.R. Curtis 1986) 


In many applications the function defining a differential equation is not analytic or 
continuous everywhere. A common example is a problem which (at least locally) 
can be written in the form 


„/■ = f fi(y ) if g(y) > o 

y 1 fiilv) if g(y) < o 


(6.23) 


with sufficiently differentiable functions g , f T and f u . The derivative of the 
solution is thus in general discontinuous on the surface 


S = {y, g(y) = 0}. 

The function g(y) is called a switching function. 

In order to understand the situations which can occur when the solution of 
(6.23) meets the surface S' in a point y 0 (i.e., g(y 0 ) = 0 ), we consider the scalar 
products 

a, = (grad <K%), /,(%)) 

(6.24) 

a n = -(grad g(y 0 )Jii(y 0 )) 

which can be approximated numerically by cij ~ g(y 0 + 8fj{y 0 ))/S with small 
enough 8 . Since the vector grad g(y 0 ) points towards the domain of f T , the in¬ 
equality dj < 0 tells us that the flow for fj is “pushing” against S, while for 
a T > 0 the flow is “pulling”. The same argument holds for a n and the flow for 
fjj . Therefore, apart from degenerate cases where either a 7 or a TI vanishes, we 
can distinguish the following four cases (see Fig. 6.2): 

1 ) dj > 0 , a u < 0 : the flow traverses S from g < 0 to g > 0 . 

2 ) dj < 0 , djj > 0 : the flow traverses S from g > 0 to g < 0 . 

3) dj > 0, d TI >0: the flow “pulls” on both sides; the solution is not unique; 

except in the case of an unhappily chosen initial value, this situation would 

normally not occur. 
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4) dj < 0 , a n < 0 : here both flows push against S ; the solution is trapped in S 
and the problem no longer has a classical solution. 



dj > 0, a n <0 cij < 0, cijj > 0 dj > 0, d n >0 dj < 0, d n < 0 
Fig. 6.2. Solutions near the surface of discontinuity 

Crossing a discontinuity. The numerical computation of a solution crossing a 
discontinuity (cases 1 and 2 ) can be performed as follows: 

a) Ignoring the discontinuity: apply a variable step size code with local error 
control (such as DOPRI5) and hope that the step size mechanism would handle 
the discontinuity appropriately. Consider the example (which represents the 
flow of the second picture of Fig. 6.2) 

, f x 2 + 2 y 2 if (x + 0.05) 2 + (y + 0.15) 2 < 1 

V — \ (6.25) 

1 2x 2 + 3 y 2 - 2 if (x + 0.05) 2 + (y + 0.15) 2 > 1 

with initial value y( 0) = 0.3. The discontinuity for this problem occurs at 
x « 0.6234 and the code, applied with Atol = Rtol = 10 -5 , detects the dis¬ 
continuity fairly well by means of numerous rejected steps (see Fig. 6.3; this 
figure, however, is much less dramatic than an analogous drawing (see Gear & 
0sterby 1984) for multistep methods). The numerical solution for x = 1 then 
has an error of 5.9 • 10 -4 . 
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b) Singularity detecting codes. Concepts have been developed (Gear & 0sterby 
(1984) for multistep methods, Enright, Jackson, Nprsett & Thomsen (1988) for 
Runge-Kutta methods) to modify existing codes in such a way that singularities 
are detected more precisely and handled more appropriately. These concepts 
are mainly based on the behaviour of the local error estimate compared to the 
step size. 

c) Use the switching function: stop the computation at the surface of discontinuity 
using Algorithm 6.4 and restart the integration with the new right-hand side. 
One has to take care that during one integration step only function values of 
either / 7 or f u are used. This algorithm, applied to Example (6.25), uses less 
than half of the function evaluations as the “ignoring algorithm” and gives an 
error of 6.6 • 10 -6 at the point x = 1. It is thus not only faster, but also much 
more reliable. 


Example 6.5. Coulomb’s law of friction (Coulomb 1785), which states that the 
force of friction is independent of the speed, gives rise to many situations with 
discontinuous differential equations. Consider the example (see Den Hartog 1930, 
Reissig 1954, Taubert 1976) 

y" + 2 Dy' + p sign y' -\-y = A cos (ujx) . (6.26) 


where the Coulomb-force p sign y' is accompanied by a viscosity term Dy' . We 
fix the parameters as 79 = 0.1, p = 4, A — 2 and uj = tt, and choose the initial 
values 


2/(0) = 3, 2/(0) = 4. 


(6.27) 


Equation (6.26), written in the form (6.23), is 
y' = v 

v' = —0.2v — y + 2 cos( 7 ra;) — ( ^ 

Its solution is plotted in Fig. 6.4. 

The initial value (6.27) is in the region v > 0 and we follow the solution until 
it hits the manifold v = 0 for the first time. This happens for x 1 « 0.5628. An 
investigation of the values 


if v > 0 (6-28) 

if v < 0 . 


a 7 = — y(x 1 ) + 2 cos( 7 rx 1 ) — 4, a n = y(x 1 ) — 2 cos( 7 rx 1 ) — 4 (6.29) 


shows that a 7 < 0 , a 77 > 0 , so that we have to continue the integration into the 
region v < 0. The next intersection of the solution with the manifold of disconti¬ 
nuity is at x 2 « 2.0352. Here a 7 < 0, a n < 0, so that a classical solution does not 
exist beyond this point and the solution remains “trapped” in the manifold (v = 0 , 
y = Const = y(x 2 ) ) until one of the values a 7 or a 77 changes sign. This happens 
for a TI at the point x 3 « 2.6281 and we can continue the integration of (6.28) in 
the region v < 0 (see Fig. 6.4). The same situation then repeats periodically. 
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Solutions in the manifold. In the case a T < 0, a IX < 0 the solution of (6.23) can 
neither be continued along the flow of y' = fj(y) nor along that of y' = f II (y ). 
However, the physical process, described by the differential equation (6.23), pos¬ 
sesses a solution (see Example 6.5). Early papers on this subject studied the con¬ 
vergence of Euler polygons, pushed across the border again and again by the con¬ 
flicting vector fields (see, e.g., Taubert 1976). Later it became clear that it is much 
more advantageous to pursue the solution in the manifold S, i.e., solve a so-called 
differential algebraic problem. This approach is advocated by Eich (1992), who 
attributes the ideas to the thesis of G. Bock, by Eich, Kastner-Maresch & Reich 
(unpublished manuscript, 1991), and by Stewart (1990). We must decide, however, 
which vector field in S should determine the solution. Several motivations (see 
Exercises 8 and 9 below) suggest to search this field in the convex hull 

f(y, A) = (1 - A )f I (y) + A f n (y), (6.30) 

of fj and fjj . This coincides, for the special problem (6.23), with Filippov’s 
“generalized solution” (Filippov 1960); but other homotopies may be of interest as 
well. The value of A must be chosen in such a way that the solution remains in S . 
This means that we have to solve the problem 

y' = f(y, A) 
o = g(y)- 

Differentiating (6.31b) with respect to time yields 

0 = grad <-j(y)y' = grad g(y)f(y, A). (6.32) 

If this relation allows A to be expressed as a function of y, say as A = G(y) , then 
(6.31a) becomes the ordinary differential equation 

y' = f{y,G(y)) (6.33) 

which can be solved by standard integration methods. Obviously, the solution of 


(6.31a) 

(6.31b) 
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(6.33) together with A = G(y) satisfy (6.32) and after integration also (6.31b) (be¬ 
cause the initial value satisfies g(y 0 ) =0). 

For the homotopy (6.30) the relation (6.32) becomes 

(l-X)a I (y)-Xa II (y)=0, i.e., A=— , (6.34) 

a i(y)+ a ii(y) 

where a T (y) and a II {y) are given in (6.24). 

Remark . Problem (6.31) is a “differential-algebraic system of index 2” and di¬ 
rect numerical methods are discussed in Chapter VI of Volume II. The instances 
where a 7 or a TI change sign can again be computed by using a dense output and 
Algorithm 6.4. 


Numerical Computation of Derivatives with Respect 
to Initial Values and Parameters 


For the efficient computation of boundary value problems by a shooting technique 
as explained in Section 1.15, we need to compute the derivatives of the solutions 
with respect to (the missing) initial values. Also, if we want to adjust unknown 
parameters from given data, say by a nonlinear least squares procedure, we have to 
compute the derivatives of the solutions with respect to parameters in the differen¬ 
tial equation. 

We shall restrict our discussion to the problem 

y' = f(x,y,B), y(x 0 )=y 0 {B) (6.35) 

where the right-hand side function and the initial values depend on a real parameter 
B . The generalization to more than one parameter is straightforward. There are 
several possibilities for computing the derivative dy/dB . 

External differentiation. Denote the numerical solution, obtained by a variable 
step size code with a fixed tolerance, by y Toi (x end , x 0 ,B). Then the most simple 
device is to approximate the derivative by a finite difference 

(vtoI Oend> B + A B) - y Tol (a; end , x 0 , B)). (6.36) 

However, due to the error control mechanism with its IF’s and THEN’s and step 
rejections, the function y Toi (x end , x 0 , B) is by no means a smooth function of the 
parameter B . Therefore, the errors of the two numerical results in (6.36) are not 
correlated, so that the error of (6.36) as an approximation to dy/dB(x en ^ x Q , B) 
is of size 0(Tol/ AB) + O(AB) , the second term coming from the discretization 
(6.36). This suggests taking for A B something like \ffol , and the error of (6.36) 
becomes of size 0{\fTot ). 
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Internal differentiation. We know from Section 1.14 that 4/ = dy/dB is the 
solution of the variational equation 

= + *(*«>) = It (*)• ( 6 . 37 ) 

Here y is the solution of (6.35). Hence, (6.35) and (6.37) together constitute a dif¬ 
ferential system for y and 4/, which can be solved simultaneously by any code. If 
the partial derivatives df /dy and df /dB are available analytically, then the error 
of dy/dB , obtained by this procedure, is obviously of size Tol. This algorithm is 
equivalent to “internal differentiation” as introduced by Bock (1981). 

If df /dy and df / dB are not available one can approximate them by finite 
differences so that (6.37) becomes 

= -B (/(*, y + AB-*,B + AB)-f(x, y, B)). (6.38) 

The solution of (6.38), when inserted into (6.37), gives raise to a defect of size 
O(AB) + 0(eps/AB) , where eps is the precision of the computer (independent 
of Tol). By Theorem 1.10.2, the difference of the solutions of (6.38) and (6.37) 
is of the same size. Choosing AB « sjeps the error of the approximation to 
dy/dB , obtained by solving (6.35), (6.38), will be of order Tol A- sjeps , so that 
for Tol > yjeps the result is as precise as that obtained by integration of (6.37). 
Observe that external differentiation and the numerical solution of (6.35), (6.38) 
need about the same number of function evaluations. 

1.5 
1.0 
.5 
.0 
-.5 
- 1.0 


As an example we consider the Brusselator 

y[ = i + yjy 2 ~ (B + i)y 1 

V2 = -®2/i — ViV2 

and compute dy/dB at x = 20 for various B ranging from B = 2.88 to B = 
3.08. We applied the code DOPRI5 with Atol = Rtol = Tol = 10 -4 . The numerical 




Fig. 6.5. Derivatives of the solution of (6.39) with respect to B 
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result is displayed in Fig. 6.5. External differentiation has been applied, once with 
A B = \ffol and a second time with A B = ATol . This numerical example clearly 
demonstrates that internal differentiation is to be preferred. 


Exercises 

1. (Owren & Zennaro 1991, Carnicer 1991). The 4-stage 4th order methods of 
Section II. 1 do not possess a dense output of order 4 (also if the numerical 
solution y 1 is included as 5 th stage). Prove this statement. 

2. Consider a Runge-Kutta method of order p and use Richardson extrapolation 
for step size control. Besides the numerical solution t/ 0 , t / l5 y 2 we consider the 
extrapolated values (see Section II.4) 

^ y 2 — w ^ y 2 — w 

vi = yi+ -i)V V2 = V2 + yrzi 

and do quintic polynomial interpolation based on y 0 , f(x 0 , y 0 ), y x , f(x 0 + 
h,y x ), y 2 , f(x o + 2/i, y 2 ). Prove that the resulting dense output formula is of 
order p* = min(5, p + 1). 

Remark. It is not necessary to evaluate / at y x . 

3. Prove that the conditions (6.19), (6.18) and (5.20) imply (6.20). 

Hint. The system (6.19) together with one relation of (6.20) is overdetermined. 
However, it possesses the solution b i for 6 = 1. Further, the values b i c i also 
solve this system if the right-hand side of (6.19a) is adapted. These properties 
imply that for k E {4, 5} and for i E {1, 6 ,..., 16} 

i—l i—1 g 

J2 % a jk = aa a + /Ks + l°i a iA + 5c i a i5 + e a iA ~ q) > 

j = k +1 J =1 

where the parameters a, /?, 7 , 5, £ may depend on fc. 

4. (Butcher). Try your favorite code on the example 

y[ = A( 2 /i>%). 2 / 1 ( 0 ) = 1 

2/2 = / 2 (2/1,2/2), 2/2(0) = 0 

where / is defined as follows. 

If (I2/1I > I2/2I) then 

/i= 0 , / 2 =sign(y 1 ) 

Else 

/ 2 = 0 , fi= -sign (: y 2 ) 

End If. 

Compute 27 ( 8 ), 2 / 2 ( 8 ) • Show that the exact solution is periodic. 
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5. Do numerical computations for the problem y' = f(y) 9 y( 0) = 1, y( 3) =? 
where 

y 2 if 0 < y < 2 

a) 1 ) 

b) 4 \ if 2 < y 

c) —4 + 42/ J 

Remark. The correct answer would be (a) 4.5, (b) 12, (c) exp(10) + 1. 

6. Consider an s -stage Runge-Kutta method and denote by s the number of dis¬ 
tinct c •. Prove that the order of any continuous extension is < s. 

Hint. Let q(x) be a polynomial of degree s satisfying g(c-) = 0 (for i = 
1,..., s) and investigate the expression J2i b i (0)q(c i ). 

7. (Step size freeze). Consider the following algorithm for the computation of 
dy/dB: first compute numerically the solution of (6.35) and denote it by 
y h (x Q ncb^O- At the same time memorize all the selected step sizes. This 
step size sequence is then used to solve (6.35) with B replaced by B + A B. 
The result is denoted by y h (x Gnd , B + A B). Then approximate the derivative 
dy/dB by 

AS (j'hfcend> B + AB ) - Vilipend’ B )) • 

Prove that this algorithm is equivalent to the solution of the system (6.35), 
(6.38), if only the components of y are considered for error control and step 
size selection. 

Remark. For large systems this algorithm needs less storage requirements than 
internal differentiation, in particular if the derivative with respect to several 
parameters is computed. 

8. (Taubert 1976). Show that for the discontinuous problem (6.23) the Euler poly¬ 
gons converge to Filippov’s solution (6.30), (6.31). 

Hint. The difference quotient of a piece of the Euler polygon lies in the convex 
hull of points fj(y) and fniy ) • 

Remark. This result can either be interpreted as pleading for myriads of Euler 
steps, or as a motivation for the homotopy (6.30). 

9. Another motivation for formula (6.30): suppose that a small particle of radius 
£ is transported in a possibly discontinuous flow. Then its movement might be 
described by the mean of / 

fe(y) = f B .(v ) /(*) dz / /b.(») dz 

which is continuous in y. Show that the solution of y' £ = f £ (y) becomes, for 
e —> 0, that of (6.33) and (6.34). 
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It has been traditional to consider only explicit processes 

(J.C. Butcher 1964a) 

The high speed computing machines make it possible to enjoy 
the advantage of intricate methods 

(P.C. Hammer & J.W. Hollingsworth 1955) 


The first implicit RK methods were used by Cauchy (1824) for the sake of — you 
have guessed correctly — error estimation (Methodes diverses qui peuvent etre 
employees au Calcul numerique ...; see Exercise 5). Cauchy inserted the mean 
value theorem into the integral studied in Sections 1.8 and II. 1, 

px 1 

y(x 1 )=y(x 0 )+ f(x,y(x))dx, (7.1) 

J Xo 

to obtain 

Vi~Vo + h f ( x o + 0h i Vo + ©G/i - Vo)) <7-2) 

with 0 < 0, 0 < 1 (the “0-method”). The extreme cases are 0 = 0 = 0 (the explicit 
Euler method) and 0 = 0 = 1 

Vi =y 0 + hf(x 1 ,y 1 ), (7.3) 

which we call the implicit or backward Euler method. 

For the sake of more efficient numerical processes, we apply, as we did in 
Section II. 1, the midpoint rule (0 = 0 = 1/2) and obtain from (7.2) by setting 

h = (vi -v 0 )/ h: 

k 1 =f( x o + ^Vo+2 k ^ (7.4) 

y% = y 0 + hk 1 . 

This method is called the implicit midpoint rule. 

Still another possibility is to approximate (7.1) by the trapezoidal rule and to 
obtain 

Vi =Vo+ 2 (f( x o>yo) + f( x vVi))- (7- 5 ) 

Let us also look at the Radau scheme 

rXo + h 

y(x 1 )-y(x 0 )= f(x,y(x))dx 

J X 0 

~ ^ [f(x 0 , Vo) + 3 f(x 0 + ^h, y(x 0 + | ft))). 
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Here we need to approximate y(x 0 + 2h/3). One idea would be the use of qua¬ 
dratic interpolation based on y Q , y' 0 and y(x 1 ), 

2/K + 3 h ) ~ 9 % + g J/fai) + g h f( x o > %)• 

The resulting method, given by Hammer & Hollingsworth (1955), is 

fc i = f( x o,yo) 

^2 = /( x o + 3 ^ % + 3 (*i + * 2 )) (7.6) 

J/i = % + ^ (^i + 3 fe 2 ). 

All these schemes are of the form (1.8) if the summations are extended up to “s”. 

Definition 7.1. Let 6-, a- (i, j = 1,..., 5 ) be real numbers and let c- be defined 
by (1.9). The method 

= /(^O + C i h > % + h ^2 a ij k 0 ) * = 1, • • • , S 

, i=1 (7.7) 

2/l = % + ^ 

is called an 5 -stage Runge-Kutta method. When = 0 for i < j we have an 
explicit (ERK) method. If a- = 0 for i < j and at least one a u 0, we have a 
diagonal implicit Runge-Kutta method (DIRK). If in addition all diagonal elements 
are identical (a- - = 7 for i = 1 ,..., s), we speak of a singly diagonal implicit 
(SDIRK) method. In all other cases we speak of an implicit Runge-Kutta method 
(IRK). 

The tableau of coefficients used above for ERK-methods is obviously extended 
to include all the other non-zero a- ’s above the diagonal. For methods (7.3), (7.4) 
and (7.6) it is given in Table 7.1. 

Renewed interest in implicit Runge-Kutta methods arose in connection with 
stiff differential equations (see Volume II). 

Table 7.1. Implicit Runge-Kutta methods 






0 

0 

0 

1 

Ll 

1/2 

1/2 

2/3 

1/3 

1/3 


1 

1 

1/4 

3/4 


Implicit Euler Implicit midpoint rule Hammer & Hollingsworth 
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Existence of a Numerical Solution 


For implicit methods, the /c- ’s can no longer be evaluated successively, since (7.7) 
constitutes a system of implicit equations for the determination of k i . For DIRK- 
methods we have a sequence of implicit equations of dimension n for k x , then 
for fc 2 , etc. For fully implicit methods s • n unknowns (£;•, i §s 1,..., s ; each of 
dimension n ) have to be determined simultaneously, which still increases the dif¬ 
ficulty. A natural question is therefore (the reason for which the original version of 
Butcher (1964a) was returned by the editors): do equations (7.7) possess a solution 
at all? 


Theorem 7.2. Let / :Rx R n —> M n be continuous and satisfy a Lipschitz condition 
with constant L (with respect to y). If 


h < 


1 

L max- Kj\ 


(7.8) 


there exists a unique solution of (7.7), which can be obtained by iteration. If f(x,y) 
is p times continuously differentiable, the functions k i (as functions of h) are also 
in C p . 


Proof. We prove the existence by iteration (“... on la resoudra facilement par des 
approximations successives ...”, Cauchy 1824) 

k (ra+l) = / (x Q + c .h, y Q + h a ij k j m) )' 

3 = 1 

We define K G R sn as K = (fc 1? ..., k s ) T and use the norm ||AT|| = max-(||/c-||). 
Then (7.7) can be written as K = F(K) where 

s 

F i ( K ) = / (* 0 + C t h, Vo + h-Y^ a ijkj ), *=!,•••,«• 

3 = 1 

The Lipschitz condition and a repeated use of the triangle inequality then show that 

HF^) -F(K 2 )\\ < hL i max^1^-1 • \\K, - K 2 \\ 

which from (7.8) is a contraction. The contraction mapping principle then ensures 
the existence and uniqueness of the solution and the convergence of the fixed-point 
iteration. 

The differentiability result is ensured by the Implicit Function Theorem of clas¬ 
sical analysis: (7.7) is written as <f>(/i, K) — K — F(K ) = 0. The matrix of partial 
derivatives d 4>/ dK for h = 0 is the identity matrix and therefore the solution of 
4>(/i, K) = 0, which for h = 0 is k i — f(x 0 , y 0 ), is continuously differentiable in 
a neighbourhood of h = 0. □ 
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If the assumptions on / in Theorem 7.2 are only satisfied in a neighbourhood 
of the initial value, then further restrictions on h are needed in order that the argu¬ 
ment of / remains in this neighbourhood. Uniqueness is then only of local nature. 

The step size restriction (7.8) becomes useless for stiff problems (L large). We 
return to this question in Vol. II, Sections IV.8 and IV. 14. 

The definition of order is the same as for explicit methods and the order con¬ 
ditions are derived in precisely the same way as in Section II.2. 


Example 7.3. Let us study implicit two-stage methods of order 3: the order condi¬ 
tions become (see Theorem 2.1) 


bi + b 2 — 1 , 


b 1 c 1 +b 2 c 2 = 1, 


b 1 c 1 +b 2 c 2 = - 


(7.9) 


b\{ a \l c l + a \2 c 2) ^2( a 21 c l + a 22 c 2) ~ g • 


The first three equations imply the following orthogonality relation (from the theory 
of Gaussian integration): 


/' 


(x — c x )(x — c 2 ) dx = 0 , 


i.e., c 2 — ( c i ~h 1/2) (7.10) 


and 


b i = 


c 2 - 1/2 


h 2 = 


3 — 6 c x 
- 1/2 


In the fourth equation we insert a 21 = c 2 — a 22 , a n = c x — a 12 and consider a 12 
and c x as free parameters. This gives 

1 / 6 - 6 ^ 2(02 -c x ) -cj2 


a 22 ~ 


. , , • ( 7 - 11 ) 

b 2 ( c 2~ c i) 

For a 12 = 0 we obtain a one-parameter family of DIRK-methods of order 3. An 
SDIRK-method is obtained if we still require a n = a 22 (Nprsett 1974b, Crouzeix 
1975, see Table 7.2). For order 4 we have 4 additional conditions, with only two 
free parameters left. Nevertheless there exists a unique solution (see Table 7.3). 


Table 7.2. SDIRK method, order 3 


7 

7 

0 

1-7 

I-27 

7 


1/2 

1/2 


7 = 


3 zb y /3 
6 


Table 7.3. Hammer & Hollingsworth, order 4 


1 ^3 

1 1 V3 

2 6 

4 4 6 

1 V3 

1 V3 1 

2 + 6 

4 + 6 4 


1/2 1/2 



208 


II. Runge-Kutta and Extrapolation Methods 


The Methods of Kuntzmann and Butcher of Order 2s 


It is clear that formula (7.4) and the method of Table 7.3 extend the one-point 
and two-point Gaussian quadrature formulas, respectively. Kuntzmann (1961) (see 
Ceschino & Kuntzmann 1963, p. 106) and Butcher (1964a) then discovered that 

for all s there exist IRK-methods of order 2s , 
following simplifying assumptions 

. The main tools of proof are the 

B(p ): 

J2 b i c i 1 = - q 

i= 1 ^ 


(7(7?) : 

E a ii c f 1 = 7 ; '>• 

0 = 1 q 

■•,s,q 1 ....,//. 

D( C) : 




i= 1 


Condition B(p) simply means that the quadrature formula ( 6 -, c-) is of order p 
or, equivalently, that the order conditions ( 2 . 21 ) are satisfied for the bushy trees 
[r,..., r] up to order p . 

The assumption C(rj) implies that the pairs of trees in Fig. 7.1 give identical 
order conditions for q < p. In contrast to explicit Runge-Kutta methods (see (5.7) 
and (5.15)) there is no need to require conditions such as b 2 = 0 (see (5.8)), because 
a ij c ]~ 1 = c-i/q is valid for all i . 

The assumption D(Q is an extension of (1.12). It means that the order condi¬ 
tion of the left-hand tree of Fig. 7.2 is implied by those of the two right-hand trees 

if <7 < C- 



Fig. 7.1. Reduction with C(q) Fig. 7.2. Reduction with D{q) 


Theorem 7.4 (Butcher 1964a). If B(p), C(rj) and D(Q are satisfied with p < 
2p + 2 and p < ( + r] + 1 , then the method is of order p. 

Proof The above reduction by C(rj) implies that it is sufficient to consider trees 
t M [t 1? ..., t m \ of order < p, where the subtrees t 1% ,,. , t m are either equal to r 
or of order > 77 + 1. Since p < + 2 either all subtrees are equal to r or there 

is exactly one subtree different from r. In the second case the number of r’s is 
< C — 1 by p<r] + ( + l and the reduction by D(Q can be applied. Therefore, 
after all these reductions, only the bushy trees are left and they are satisfied by 
B(p). □ 
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To obtain the formulas of order 2s, Butcher assumed B{2s) (i.e., the c- and 
b i are the coefficients of the Gaussian quadrature formula) and C(s ). This implies 
D(s) (see Exercise 7) so that Theorem 7.4 can be applied with p = 2s, 7] = s and 
( = s. Hence the method, obtained in this way, is of order 2s. For s = 3 and 4 the 
coefficients are given in Tables 7.4 and 7.5. They can still be expressed by radicals 
for s = 5 and are given in Butcher (1964a), p. 57. 

Impressive numerical results from celestial mechanics for these methods were 
first reported in the thesis of D. Sommer (see Sommer 1965). 


Table 7.4. Kuntzmann & Butcher method, order 6 


1 

vT5 

5 

2 y/15 

5 

Vl5 

2 

“ "kT 

36 

9 ~~ HT 

36 

~ ~w 


1 

5 /l5 

2 

5 

Vl5 


2 

36 + 24 

9 

36 

24 

1 

Til 

5 vT5 

2 Vl5 


5 

2 

+ 1^ 

36 + ^0~ 

9 + _ L5“ 


36 



5 

4 


5 



18 

9 


18 


Table 7.5. Kuntzmann & Butcher method, order [ 


i -W2 

LOl 

0/3 - o/ 3 + 0/4 

o/l - CJ 3 - 0/4 

LJl -CJ5 

1 / 
2~ U 2 

UJl — o/ 3 + CJ4 

/ 

CJi 

/ / 
^ 1-^5 

0/1 — cj 3 — 0/4 

h + u; 2 

0/1 + 0/3 + U/4 

<^i + a 4 

U 1 

0/ 1 + 0/3 — 0/4 

h +^2 

o/i + 0/5 

0/^ + L </ 3 + 0/4 

+ o/ 3 — 0/4 

^1 


2o/i 

2o/^ 

2o>i 

2o/i 

LOl 

_ 1 a /30 
~ 8 144 ’ 


, 1 a /30 

^~8 + l 44 ' 



U2 = 2 


1 /15 + 2y / 30 


35 


ii>2 = ■ 


1 , / 15 - 2^30 
35 ’ 


CJ 3 


0/4 


(l V30\ 

= CJ2 U + ^j’ 

( 1 5V30\ 


cj 5 =cj 2 -2cj 3 , 


/ //I V30\ 

W3=W2 U~i4-J’ 

/ _ / / 1 5V30\ 

/ _ / n , f 

^5 — ^2 — 2 o / 3 . 


An important interpretation of the assumption C(rj) is the following: 
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Lemma 7.5. The assumption C(rj) implies that the internal stages 

s 

9i = y 0 + hYl a ijkj, kj = f(x 0 + Cjh, g j) (7.12) 

3 = 1 

satisfy for i = 1 ,..., s 

9i~y( x o + c i h ) = 0(h v+1 ). (7.13) 

Proof Because of C(rj) the exact solution satisfies (Taylor expansion) 

s 

y(x o + q/i) =y 0 + hy2 a ijV'( x o + c j h ) + 0(W +1 ). (7.14) 

3 = 1 

Subtracting (7.14) from (7.12) yields 

s 

9i - y(x o + Cjft) = h % f(x 0 + Cjh, gj) -f(x 0 + Cjh, y(x 0 + Cjh))j 

3 = 1 

+ 0(h v+1 ) 

and Lipschitz continuity of / proves (7.13). □ 


IRK Methods Based on Lobatto Quadrature 

Lobatto quadrature rules (Lobatto 1852, Radau 1880, p. 307) modify the idea of 
Gaussian quadrature by requiring that the first and the last node coincide with the 
interval ends, i.e., c 1 = 0, c s = 1. These points are easier to handle and, in a 
step-by-step procedure, can be used twice. The remaining c’s are then adjusted 
optimally, i.e., as the zeros of the Jacobi orthogonal polynomial or °f 

Pg_ 1 (x) (see e.g., Abramowitz & Stegun 1964, 25.4.32 for the interval [-1,1]) and 
lead to formulas of order 2 s — 2. 

J.C. Butcher (1964a, p.51,1964c) then found that Lobatto quadrature rules can 
be extended to IRK-methods whose coefficient matrix is zero in the first line and 
the last column. The first and the last stage then become explicit and the number 
of implicit stages reduces to s — 2. The methods are characterized by B (2s — 2) 
and C(s — 1). As in Exercise 7 this implies D(s — 1) so that by Theorem 7.4 the 
method is of order 2s — 2. For s = 3 and 4, the coefficients are given in Table 7.6. 

We shall see in Volume II (Section IV.3, Table 3.1) that these methods, al¬ 
though preferable as concerns the relation between order and implicit stages, are 
not sufficiently stable for stiff differential equations. 
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Table 7.6. Butcher’s Lobatto formulas of orders 4 and 6 






0 

0 

0 

0 

0 

0 

0 

0 

0 

5 — V5 

5 + V5 

1 

15-7^5 

0 

10 

60 

6 

60 

1 

1 

1 

0 

5 +V5 

5 — y/E 

15 + 7 a/5 

1 

0 

2 

4 

4 

10 

60 

60 

6 

1 

0 

1 

0 

1 

1 

6 

5 — y/5 

12 

5 +V5 

12 

0 


1 

2 

1 


1 

5 

5 

1 


6 

3 

6 


12 

12 

12 

12 


Collocation Methods 

Es ist erstaunlich dass die Methode trotz ihrer Primitivitat und 
der geringen Rechenarbeit in vielen Fallen ... sogar gute Ergeb- 
nisse liefert. (L. Collatz 1951) 

Nous allons montrer V equivalence de notre definition avec la 
definition traditionnelle de certaines formules de Runge Kutta 
implicites. (Guillou & Soule 1969) 

The concept of collocation is old and universal in numerical analysis (see e.g., 
pp. 28,29,32,181,411,453,483,495 of Collatz 1960, Frazer, Jones & Skan 1937). 
For ordinary differential equations it consists in searching for a polynomial of de¬ 
gree s whose derivative coincides (“co-locates”) at s given points with the vector 
field of the differential equation (Guillou & Soule 1969, Wright 1970). Still an¬ 
other approach is to combine Galerkin’s method with numerical quadrature (see 
Hulme 1972). 

Definition 7.6. For s a positive integer and c x ,..., c s distinct real numbers (ty¬ 
pically between 0 and 1), the corresponding collocation polynomial u(x) of de¬ 
gree s is defined by 

u(x 0 )=y 0 (initial value) (7.15a) 

t/(x 0 + c-/i) = f(x 0 + c-/i, u(x 0 + c-/i)), i = 1,..., s. (7.15b) 

The numerical solution is then given by 

y 1 = u(x 0 + h). (7.15c) 

If some of the c- coincide, the collocation condition (7.15b) will contain higher 
derivatives and lead to multi-derivative methods (see Section 11.13). Accordingly, 
for the moment, we suppose them all distinct. 
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Theorem 7.7 (Guillou & Soule 1969, Wright 1970). The collocation method (7.15) 
is equivalent to the s-stage IRK-method (7.7) with coefficients 



where the (t) are the Lagrange polynomials 


m =n 


( f ~ c fc) 

( c j-°ky 


(7.17) 


Proof. Put u'(x 0 + cji) = so that 

s 

u'(x 0 + th) = ^2 kj * f'j if) (Lagrange). 

3 = 1 


Then integrate 

/* Ci 

it(x 0 + Cjh) = 2/ 0 + h / rt'(x 0 + th) dt (7.18) 

J o 

and insert into (7.15b) together with (7.16). The IRK-method (7.7) then comes out. 

□ 


As a consequence of this result, the existence and uniqueness of the collocation 
polynomial (for sufficiently small h ) follows from Theorem 7.2. 


Theorem 7.8. An implicit Runge-Kutta method with all c i different and of order at 
least s is a collocation method iff C{s) is true. 


Proof. C(s) determines the a- uniquely. We write it as 



(7.19) 


for all polynomials p of degree < s — 1. The a- given by (7.16) satisfy this rela¬ 
tion, because (7.16) inserted into (7.19) is just the Lagrange interpolation formula. 

□ 


Theorem 7.9. Let M(t) = rii=i(^ —c z) anc ^ su PP ose that M is orthogonal to 
polynomials of degree r — 1, 

dt = 0, g = l,...,r, (7.20) 

then method (7.15) has order p — s + r. 
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Proof. The following proof uses the Grobner & Alekseev Formula, which gives 
nice insight in the background of the result. An alternative proof is indicated in 
Exercise 7 below. One can also linearize the equation, apply the linear variation- 
of-constants formula and estimate the error (Guillou & Soule 1969). 

The orthogonality condition (7.20) means that the quadrature formula 

rx 0 +h s ^ 

/ g(t) dt — h E b j9( x o + c j h )+err(g) (7.21) 

Jxo j=1 

is of order s + r = p , and its error is bounded by 

\err(g) \ < Ch p+1 • max \g^ p \x)\. (7.22) 

The principal idea of the proof is now the following: we consider 
u'(x) = f(x, u(x)) + (u'(x) - f(x, u(x))) 
as a perturbation of 

y\ x ) = f( x ,v( x )) 


and integrate the Grobner & Alekseev Formula (1.14.18) with the quadrature for¬ 
mula (7.21). Due to (7.15b), the result is identically zero, since at the collocation 
points the defect is zero. Thus from (7.21) and (7.22) 

\\y( x o + h ) ~u(x 0 + h)\\ = \\err{g)\\ <C-h p+1 - max ||fir (p) (i)||, (7.23) 

xo<t<xo+h 

where 

g(t) = 0- (x, t, u(t )) • (u\t) - f(t, u(t))), 

and we see that the local error behaves like 0(h p+1 ). 

There remains, however, a small technical detail: to show that the derivatives 
of g(t) remain bounded for h —> 0. These derivatives contain partial derivatives 
of f(t,y) and derivatives of u(t) . We shall see in the next theorem that these 
derivatives remain bounded for h —> 0 . □ 


Theorem 7.10. The collocation polynomial u(pc) gives rise to a continuous IRK 
method of order s, i.e.,for all x 0 < x < x 0 + h we have 

\\y(x) - u(x)\\ < C ■ h s+1 . (7.24) 

Moreover, for the derivatives of u{x) we have 

\\y( k \x)-u^(x)\\<C-h s+1 - k k = 0,... ,s. (7.25) 

Proof The exact solution y(x) satisfies the collocation condition everywhere, 
hence also at the points x 0 +c { h. So, in exactly the same way as in the proof 
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of Theorem 7.7, we apply the Lagrange interpolation formula to y'(x) : 

y'(x 0 + th ) = ^2 f( x o + c j h - 2/(^0 + c j h ))^j( t ) + hS R(t, h ) 

3 — 1 

where R(t,h) is a smooth function of both variables. Integration and subtraction 
from (7.18) gives 

s pt pt 

y(x 0 + th) — u(x 0 +th) = h^^Afj • / ^-(r) dr + ft s+1 / R(r^h)dr , (7.26) 

y=1 Jo Jo 

where 

Afj =f(x 0 + Cjh , 2/(z 0 + c^-ft)) -f(x 0 + Cjh , n(x 0 + c^-ft)). 

The /cth derivative of (7.26) with respect to £ is 

h k [y {k \x 0 + th) -u (k \x 0 + th)j =h^Af j -£f~ 1 \t)+h s+ 1 ^j-^(t,h), 

3 = 1 

so that the result follows from the boundedness of the derivatives of 7?(£, ft) and 
from A/• = 0(ft s+1 ) which is a consequence of Lemma 7.5. □ 

Remark. Only some IRK methods are collocation methods. An extension of the 
collocation idea (“Perturbed Collocation”, see Nprsett & Wanner 1981) applies to 
all IRK methods. 


Exercises 

1. Compute the one-point collocation method (s = 1) with c• = 0 and compare 
with (7.2). Determine its order in dependence of 6 . 

2. Compute all collocation methods with s = 2 of order 2 in dependence of c 1 
and c 2 . 

3. Specify in the method of Exercise 2 c 1 = 1/3, c 2 = 1 as well as c x = 0, c 2 M 
2/3. Determine the orders of the obtained methods and explain. 

4. Interpret the implicit midpoint rule (7.4) and the explicit Euler method as col¬ 
location methods. Is method (7.5) a collocation method? Method (7.6)? 

5. (Cauchy 1824). Find from equation (7.2) conditions for the function f(x, y) 
such that for scalar differential equations 

y x (explicit Euler) >y(x 1 ) > y 1 (implicit Euler). 
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Compute five steps with h = 0.2 with both methods to obtain upper and lower 
bounds for y( 1), the solution of 

y'= cos^-t^, 2/(0) = o. 

5 

Cauchy’s result: 0.9659 < y( 1) < 0.9810. For one single step with h = 1 he 
obtained 0.926 < y{ 1) < 1. 

Compute the exact solution by elementary integration. 

6. Determine the orders of the methods of Table 7.7. Generalize to arbitrary s 
(Ehle 1968). 

Hint. Use Theorems 7.8 and 7.9. 


Table 7.7. Methods of Ehle 

Radau IIA, order 5 Lobatto IIIA, order 4 



7. (Butcher 1964a). Give an algebraic proof of Theorem 7.9. 

Hint. From Theorem 7.8 we have C(s). 

Next the condition B(p) with p = s + r (theory of Gaussian quadrature for¬ 
mulas) implies D (r) . To see this, multiply the two vectors u- = JT b i c^~ 1 a i j 
and v- =bj(l — Cj)/q (j = 1,..., s ) by the Vandermonde matrix 



Finally apply Theorem 7.4. 



II.8 Asymptotic Expansion of the Global Error 


Mein Verzicht auf das Restglied war leichtsinnig ... 

(W. Romberg 1979) 


Our next goal will be to perfect Richardson’s extrapolation method (see Section 
II.4) by doing repeated extrapolation and eliminating more and more terms Ch p+k 
of the error. A sound theoretical basis for this procedure is given by the study 
of the asymptotic behaviour of the global error. For problems of the type y' = 
f(x ), which lead to integration, the answer is given by the Euler-Maclaurin formula 
and has been exploited by Romberg (1955) and his successors. The first rigorous 
treatments for differential equations are due to Henrici (1962) and Gragg (1964) 
(see also Stetter 1973). We shall follow here the successive elimination of the 
error terms given by Hairer & Lubich (1984), which also generalizes to multistep 
methods. 

Suppose we have a one-step method which we write, in Henrici’s notation, as 

Vn+l = Vn + h H x n^VnX)- ( 8 . 1 ) 

If the method is of order p, it possesses at each point of the solution y(x) a local 
error of the form 

y(x + h) - y(x) - h$(x, y(x), h) = 

d p+1 (x)h p+1 + ... + d N+1 (x)h N+1 + 0(h N+2 ) 

whenever the differential equation is sufficiently differentiable. For Runge-Kutta 
methods these error terms were computed in Section II.2 (see also Theorem 3.2). 


The Global Error 


Let us now set y n =: y h {x) for the numerical solution at x = x 0 + nh. We then 
know from Theorem 3.6 that the global error behaves like h p . We shall search for 
a function e (x) such that 

V^)-y h {x) = e p {x)h p + °{h p ). (8.3) 


The idea is to consider 


Vh( x ) + e P ( x ) hP ='-yh( x ) 


(8.4a) 
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as the numerical solution of a new method 

Vn+l =Vn + h H x rv Vm h )• ( 8 - 4b ) 

By comparison with (8.1), we see that the increment function for the new method 
is 

$(x,y,h) = §(x,y-e p (x)h p ,h ) + (e p (x + h) - e p (:r))/i p_1 . (8.5) 

Our task is to find a function e (x) , with e p (x 0 ) = 0, such that the method with 
increment function <I> is of order p+1. 

Expanding the local error of the one-step method into powers of h we obtain 

y(x + h)~ y(x) - h${x, y(x), h ) 


= (d p+1 (x) + (x,y(x))e p (x) - e' p (x)fi p+1 + 0(h p+2 ) 

where we have used 

a*, a/ 

The term in h p+1 vanishes if e (x) is defined as the solution of 


( 8 . 6 ) 


(8.7) 


/ 9f 

e' p (x) = (x,y(x))e p (x) + d p+1 (z), e p (x 0 ) = 0. (8.8) 

By Theorem 3.6, applied to the method $, we now have 

y(x) ~ y h (x) = e p (x)h p + 0(/i p+1 ) (8.9) 

and the first term of the desired asymptotic expansion has been determined. 

We now repeat the procedure with the method with increment function . It is 
of order p+1 and again satisfies condition (8.7). The final result of this procedure 
is the following 


Theorem 8.1 (Gragg 1964). Suppose that a given method with sufficiently smooth 
increment function satisfies the consistency condition <T(x, y, 0) = f(x, y ) and 
possesses an expansion (8.2) for the local error. Then the global error has an 
asymptotic expansion of the form 

y(x)[ - y h (x) = e p (x)h p +... + e N (x)h N + E h (x)h N+1 (8.10) 

where the e-(x) are solutions of inhomogeneous differential equations of the form 
(8.8) with ej(x 0 ) = 0 and E h (x) is bounded for x 0 < x < x end and 0 < h < h 0 . 

□ 


The differentiability properties of the e • (x) depend on those of / and (see 
(8.8) and (8.2)). The expansion (8.10) will be the theoretical basis for all discus¬ 
sions of extrapolation methods. 
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Examples. 1. For the equation y' — y and Euler’s method we have with h = l/n 
and x = 1, using the binomial theorem, 

By multiplying out, this gives 

OO OO <jf(j) 

vi 1 ) - Vhi 1 ) = ~ ^2 h% ^2 r 1% = L359/l “ L246/l2 ± • • • 

i=i j=i [+J) ' 

where the are the Stirling numbers of the first kind (1730, see Abramowitz & 
Stegun 1964, Section 24.1.3). This is, of course, the Taylor series for the function 

e _ { l + /l) iA = e _exp(l-| + ^±...)=e(l^E /l 2 + ^ /l 3 ± . 

with convergence radius r = 1. 

2. For the differential equation y' = f(x) and the trapezoidal rule (7.5), the 
expansion (8.10) becomes 

r ^ n l 2 /c 

l f{x) dx - y h { 1) = - £ ( 2 fcj! ^ (/ (2 ^ 1) ( 1 ) - / (2fc_1) (°)) + 0(/t 2Ar+1 ), 

the well known Euler-Maclaurin formula (1736). For N —> oo, the series will 
usually diverge, due to the fast growth of the Bernoulli numbers for large k . It may, 
however, be useful for small values of N and we call it an asymptotic expansion 
(Poincare 1893). 


Variable h 

Theorem 8.1 is not only valid for equal step sizes. A reasonable assumption for the 
case of variable step sizes is the existence of a function r{x)> 0 such that the step 
sizes depend as 

X n+1 ~ X n = T ( X n) h C 8 - 11 ) 

on a parameter h. Then the local error expansion (8.2) becomes 

y(x + r(x)h) -y{x) - hr(x)^(x,y(x),r(x)h) = d p+1 {x)r p+1 {x)h p+1 +... 

and instead of (8.5) we have 

- h p ( \ 

$(x, y, r(x)h) = <S>(x,y-e p (x)h p , r(x)h) + [e p (x + r(x)h) - e p (x)j. 

With this the local error expansion for the new method becomes, instead of (8.6), 

y(x + r(x)h) — y{x) — hr(x)^(x, y(x),r(x)h) 
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= t(x) (d p+1 (x)r p {x) + (x, y(x))e p (x) - e' p (x)')h p+1 + 0(h p+2 ) 

and the proof of Theorem 8.1 generalizes with slight modifications. 


Negative h 


The most important extrapolation algorithms will use asymptotic expansions with 
even powers of ft. In order to provide a theoretical basis for these methods, we 
need to explain the meaning of y h {x) for ft negative. 

Motivation. We write (8.1) as 

y h (x + h) = y h (x) + h®(x, y h (x), h ) (8.1’) 

and replace ft, by —ft to obtain 

V-h( x -h)= y_ h (x ) - h<f>(x, y_ h {x), -h). 

Next we replace x by x + ft which gives 

y~h( x ) = y~h( x + h)~ h<5>(x + h, y_ h {x + h), -h ). (8.12) 

This is an implicit equation for y_ h {x + h), which possesses a unique solution for 
sufficiently small ft (by the implicit function theorem). We write this solution in 
the form 

y_ h (x + h) = y_ h {x) + h®*(x, y_ h (x), h). (8.13) 

The comparison of (8.12) and (8.13) (with A = y_ h {x + ft), B = y_ h (x) ) leads 
us to the following definition. 


Definition 8.2. Let <1 >(x,y,h) be the increment function of a method. Then we 
define the increment function <T*(x,t/, ft) of the adjoint method by the pair of 
formulas 


B = A — h<&(x + ft, A, —ft) 
A = B-hh$*(x,B,h). 


(8.14) 


Example. The adjoint method of explicit Euler is implicit Euler. 

Theorem 8.3. Let be the Runge-Kutta method (7.7) with coefficients a-, b-, 
c i (l j = 1, • • • , s) • Then the adjoint method <I>* is equivalent to a Runge-Kutta 
method with s stages and with coefficients 


c* = 1 - c 


s+l—i 


CL 


ij u s+l-j 
b *j = b s+l-j■ 


's+l—z,s+l— j 
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Proof. The formulas (8.14) indicate that for the definition of the adjoint method we 
have, starting from (7.7), to exchange y 0 ^y 1 , —h and replace —> x 0 + h. 
This then leads to 

s 

K = f{x 0 + (l - Ci)h, y 0 + h J2( b j - a ij) k j ) 

3 = 1 
s 

y 1= y 0 + hJ2 b j k j■ 

3 = 1 

In order to preserve the usual natural ordering of c 1 ,..., c s , we also permute the 
k • -values and replace all indices z by s + 1 — i. □ 


Properties of the Adjoint Method 

Theorem 8.4. <h** = <h. 

Proof This property, which is the reason for the name “adjoint”, is seen by replac¬ 
ing h —► —h and then x x + h, B —> A, A^^Bin (8.14). □ 

Theorem 8.5. adjoint method has the same order as the original method. Its 
principal error term is the error term of the first method multiplied by (— l) p . 

Proof. We replace h by — h in (8.2), then x^x + h and rearrange the terms. This 
gives (using d p+1 (x + h) = d p+1 (x) + 0{h)) 

y{x) + d p+1 (x)h p+1 (-l) p + 0{h p+2 ) 

= y(x + h) — h<&(x + h , y(x + h), —h). 

Here we let B be the left-hand side of this identity, A = y(x + h), and use (8.14). 
This leads to 

y(x + h) = y(x) + d p+l {x)h p+1 (-l) p + h$*(x, y(x), h) + 0{h p+2 ), 

which expresses the statement of the theorem. □ 

Theorem 8.6. The adjoint method has exactly the same asymptotic expansion 
(8.10) as the original method, with h replaced by —h. 

Proof. We repeat the procedure which led to the proof of Theorem 8.1, with h 
negative. The first separated term corresponding to (8.9) will be 

/;(•'•) - y-h( x ) = e P ( x )(.~ h ) p + o{h p+l ). 


(8.9’) 
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This is true because the solution of (8.8) with initial value e p (x 0 ) = 0 has the same 
sign change as the inhomogenity d p+1 (x). This settles the first term. To continue, 
we prove that the transformation (8.4b) commutes with the adjunction operation, 
i.e., that 

($)* = ($*f. (8.15) 


In order to prove (8.15), we obtain from (8.4a) and the definition of 

Vh( x + h) + e p (x + h)h p = y h (x) + e p (x)h p + h$(x, y h (x) + e p (x)h p , h). 

Here again, we substitute h —> —h followed by x —> x-\-h. Finally, we apply 
(8.14) with B = y_ h (x) + e p (x)(-h) p and A = y_ h (x + h) + e p (x + h)(-h) p to 

obtain 

y_ h {x + h) + e p (x + h)(-h) p 

= V- h {x) + e p (x)(-h) p + h($)* (x, y_ h (x ) + e p (x)(-h) p , h). 

On the other hand, if we perform the transformation (see Theorem 8.5) 

y_ h (x) = y_ h (x)+e p (x)(-h) p (8.4’) 

and insert this into (8.13), we obtain (8.16) again, but this time with (T>*) instead 
of ($)*. This proves (8.15). □ 


Symmetric Methods 

Definition 8.7. A method is symmetric if $ = 4>*. 

Example. The trapezoidal rule (7.5) and the implicit mid-point rule (7.4) are sym¬ 
metric: the exchanges y 1 y 0 , h —h and x 0 ^ x 0 + h leave these methods 
invariant. The following two theorems (Wanner 1973) characterize symmetric IRK 
methods. 

Theorem 8.8. If 

a s+1 _ itS+1 _ j + a ij =b s+1 _ j = b j , i,j = l,...,s, (8.17) 

then the corresponding Runge-Kutta method is symmetric. Moreover, if the b i are 
nonzero and the c i distinct and ordered as c 1 <c 2 < ... < c s , then condition (8.17) 
is also necessary for symmetry. 

Proof. The sufficiency of (8.17) follows from Theorem 8.3. The condition c i = 
1 — c s+1 _ • can be verified by adding up (8.17) for j = 1,..., s. 
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Symmetry implies that the original method (with coefficients c •, a -, bj ) and 
the adjoint method (c*, a* j5 b *) give identical numerical results. If we apply both 
methods to y' = f(x) we obtain 

EVte) = !>?/«) 

i= 1 i= 1 

for all f {pc). Our assumption on b i and q thus yields 

b* = bi, c* = c, for all i. 

We next apply both methods to y[ = f(x) , y f 2 = x ( iy 1 and obtain 

E b i c i a ijf( c j ) = E b * c i 9 a *jf( c j)- 

i,j =1 i,j=l 

This implies JT b^a- = b^af- for q = 0,1,... and hence also aL = a- 
for al li,j. □ 


Theorem 8.9. A collocation method based on symmetrically distributed colloca¬ 
tion points is symmetric. 

Proof. If c • = 1 — c s+1 _ •, the Lagrange polynomials satisfy £•(£) = £ s+1 _ i (l — t). 
Condition (8.17) is then an easy consequence of (7.19). □ 


The following important property of symmetric methods, known intuitively for 
many years, now follows from the above results. 

Theorem 8.10. If in addition to the assumptions of Theorem 8.1 the underly¬ 
ing method is symmetric, then the asymptotic expansion (8.10) contains only even 
powers ofh: 

y(x) - y h {x) = e 2q (x)h 2q + e 2q+2 (x)h 2q+2 + ... (8.18) 

with e 2 j(x 0 ) = 0 . 

Proof. If <1>* = <T, we have y_ h (x) = y h (x) from (8.13) and the result follows 
from Theorem 8.6. □ 
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Exercises 

1. Assume the one-step method (8.1) to be of order p > 2 and in addition to 

<£>(£, y, 0) = f(x, y ) assume 

|| ( x , y, o) = \ (x, y) + || 0, y) ■ f(x, y)). (8.19) 

Show that the principal local error term of the method defined in (8.5) is 
then given by 

d p+2 (x) = d p+2 (x) - 1 || (x,y(:r))d p+1 (x) - | d p+1 (x). 

Verify that (8.19) is satisfied for all RK-methods of order > 2. 

2. Consider the second order method 

0 

1 1 


| !/2 1/2 

applied to the problem y' = y , t/(0) = 1. Show that 



Show that for this method 


d 3 0) = ^ (F(t 3 2 )(y( x )) ~ \ F (hi)(y( x )i) 

d 4 O) = (^^X^O)) + \ F (t43)(y( x )) ~ \ ^41)0/0))) 

in the notation of Table 2.2. Show that this implies 

d 4 (x) = 0 and e 3 (x) = 0, 

so that one step of Richardson extrapolation increases the order of the method 
by two. Find a connection between this method and the GBS-algorithm of 
Section II.9. 


4. Discuss the symmetry of the IRK methods of Section II.7. 
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The following method of approximation may or may not be new, 
but as I believe it to be of practical importance... 

(S.A. Corey 1906) 

The h 2 -extrapolation was discovered by a hint from theory fol¬ 
lowed by arithmetical experiments, which gave pleasing results. 

(L.F. Richardson 1927) 

Extrapolation constitutes a powerful means ... 

(R. Bulirsch & J. Stoer 1966) 

Extrapolation does not appear to be a particularly effective way 
..., our tests raise the question as to whether there is any point to 
pursuing it as a separate method. 

(L.F. Shampine & L.S. Baca 1986) 


Definition of the Method 

Let y' — f(x,y ), y(x 0 ) =y 0 be a given differential system and H > 0 a basic step 
size. We choose a sequence of positive integers 

n 1 < n 2 < n 3 < ... (9.1) 

and define the corresponding step sizes h 1 > h 2 > h 3 > ... by h i = H/n i . We 
then choose a numerical method of order p and compute the numerical results of 
our initial value problem by performing n i steps with step size h i to obtain 

VhMo + H ) = :T i,i ( 9 - 2 ) 


(the letter “T” stands historically for “trapezoidal rule”). We then eliminate as 
many terms as possible from the asymptotic expansion (8.10) by computing the 
interpolation polynomial 



p(h) = y- e p h p - e p+1 h p+1 - ... - e p+k _ 2 h p+k 2 

(9.3) 

such that 

P( h i)= T i, 1 i = j,j-l,...,j-k+l. 

(9.4) 

Finally we 

“extrapolate to the limit ” h —> 0 and use 



p(0) = V =: T j k 



as numerical result. Conditions (9.4) consist of k linear equations for the k un- 
knowns y, e p ,..,, e p+k _ 2 . 


Example. For k = 2, n 1 = 1, n 2 = 2 the above definition is identical to Richard¬ 
son’s extrapolation discussed in Section II.4. 
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Theorem 9.1. The value T- k represents a numerical method of order p + k — 1. 


Proof We compare (9.4) and (9.3) with the asymptotic expansion (8.10) which we 
write in the form (with N = p + k — 1) 

T iA = y(x 0 +H ) - e p {x 0 +H)h> - ... - e p+k _ 2 (x 0 +H)h+~ 2 - \ , (9.4’) 
where 


\ = e p+k -i(.Xo+H)h^ k - 1 +E hi (x 0 +H)h^ +k = 0(H p+k ) 


because e p3 _ k _ 1 (x 0 ) = 0 and h i <H. This is a linear system for the unknowns 
y(x 0 +H), H p e p (x 0 +H),H p+k ~ 2 e p+k _ 2 (x 0 +H) with the Vandermonde- 
like matrix 


/i 4 


A = 


n A 


tf-k+l 


1 


p-\-k —2 


n 


p+k—2 
U J-k +1 


It is the same as (9.4), just with the right-hand side perturbed by the 0(Hv+ k )- 
terms . The matrix A is invertible (see Exercise 6). Therefore by subtraction 
we obtain 


\y(x 0 +H)-y\<\\A 1 [[^ • max | 0(H p+k ). □ 


Remark. The case p = 1 (as well as p = 2 with expansions in h 2 ) can also be 
treated by interpreting the difference y(x 0 + H) — y as an interpolation error (see 
(9.21)). 

A great advantage of the method is that it provides a complete table of numer¬ 
ical results 


T u 




T 2 i 

^22 



t 31 

^32 

^33 

(9.5) 

t 41 

^42 

t 43 t 44 



which form a sequence of embedded methods and allow easy estimates of the local 
error and strategies for variable order. Several step-number sequences are in use 
for (9.1): 

The “Romberg sequence” (Romberg 1955): 


1 , 2 , 4 , 8 , 16 , 32 , 64 , 128 , 256 , 512 ,... 


( 9 . 6 ) 
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The “Bulirsch sequence ” (see also Romberg 1955): 

1,2,3,4,6,8,12,16,24,32,... (9.7) 

alternating powers of 2 with 1.5 times 2 k . This sequence needs fewer function 
evaluations for higher orders than the previous one and became prominent through 
the success of the “Gragg-Bulirsch-Stoer algorithm” (Bulirsch & Stoer 1966). 

The above sequences have the property that for integration problems y' = 
f(x) many function values can be saved and re-used for smaller h i . Further, 
liminf(n- +1 /n-) remains bounded away from 1 (“Toeplitz condition”) which al¬ 
lows convergence proofs for j = k —> oc (Bauer, Rutishauser & Stiefel 1963). How¬ 
ever, if we work with differential equations and with fixed or bounded order, the 
most economic sequence is the “harmonic sequence” (Deuflhard 1983) 

1,2,3,4,5,6,7,8,9,10,.... (9.8) 


The Aitken - Neville Algorithm 


For the case p = 1, (9.3) and (9.4) become a classical interpolation problem and 
we can compute the values of T- k economically by the use of classical methods. 
Since we need only the values of the interpolation polynomials at the point h = 0, 
the most economical algorithm is that of “Aitken - Neville” (Aitken 1932, Neville 
1934, based on ideas of Jordan 1928) which leads to 


T ik~ T i -1 k 

rp _ rp | J 1 ,K 


(9.9) 


If the basic method used is symmetric , we know that the underlying asymptotic 
expansion is in powers of h 2 (Theorem 8.9), and each extrapolation eliminates two 
powers of h. We may thus simply replace in (9.3) h by h 2 and for p = 2 (i.e., q = 1 
in (8.18)) also use the Aitken - Neville algorithm with this modification. This leads 
to 


T j,k +1 - T j,k + 


( n j/ n j-k) 2 ~ 1 


(9.10) 


instead of (9.9). 


Numerical example. We solve the problem 

y' = (—y sin a;+ 2 tan x) 2 /, y( 7t/6) = 2/a/3 (9.11) 

with true solution y{x) = 1/ cos x and basic step size H — 0.2 by Euler’s method. 
Fig. 9.1 represents, for each of the entries T- k of the extrapolation tableau, the 
numerical work (1 + n- — 1 + n J _ 1 — 1 + ... -j- n-_ k+1 — 1) compared to the pre¬ 
cision (| T- k — y(x 0 + iT)|) in double logarithmic scale. The first picture is for the 
Romberg sequence (9.6), the second for the Bulirsch sequence (9.7), and the last 
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Fig. 9.1. h -extrap. expl. Euler Fig. 9.2. h 2 -extrap. impl. midpoint 


for the harmonic sequence (9.8). In pictures 2 and 3 the results of the foregoing 
graphics are repeated as a shaded “ghost” (... of Canterville) in order to demon¬ 
strate how the results are better than those for the predecessor. Nobody is perfect, 
however. The “best” method in these comparisons, the harmonic sequence, suffers 
for high orders from a strong influence of rounding errors (see Exercise 5 below; 
the computations of Fig. 9.1, 9.2 and 9.4 have been made in quadruple precision). 

The analogous results for the symmetric implicit mid-point rule (7.4) are pre¬ 
sented in Fig. 9.2. Although implicit, this method is easy to implement for this 
particular example. We again use the same basic step size H — 0.2 as above and 
the same step-number sequences (9.6), (9.7), (9.8). Here, the “numerical work” 
[n- +n J -_ 1 + ... -\-n-_ k+1 ) represents implicit stages and therefore can not be 
compared to the values of the explicit method. The precisions, however, show a 
drastic improvement. 

Rational Extrapolation. Many authors in the sixties claimed that it is better to use 
rational functions instead of polynomials in (9.3). In this case the formula (9.9) 
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must be replaced by (Bulirsch & Stoer 1964) 

T j,k +1 = T j,k + 


T j,k ~ T j-l,k 


( n 0 ^ (l _ T -)- k T o- 1 X ^ 

\ n j-kJ\ T 3,k~ T j-l,k-l ) 


-1 


where 


T :,o=0- 


(9.12) 


For systems of differential equations the division of vectors is to be understood 
componentwise. 

Later numerical experiments (Deuflhard 1983) showed that rational extrapola¬ 
tion is nearly never more advantageous than polynomial extrapolation. 


The Gragg or GBS Method 

Since it is fully explicit GRAGG’s algorithm is so ideally suited as 
a basis for RICHARDSON extrapolation that no other symmetric 
two-step algorithm can compete with it. (H.J. Stetter 1970) 


Here we can not do better than quote from Stetter (1970): “Expansions in powers of 
h 2 are extremely important for an efficient application of Richardson extrapolation. 
Therefore it was a great achievement when Gragg proved in 1963 that the quantity 
S h (x) produced by the algorithm (x = x 0 +2nh, x i = x 0 +ih) 

y 1= y 0 +hf(x 0 ,y 0 ) (9.13a) 

y i+1 = y i _ 1 + 2hf(x i ,y i ) i=l,2, ...,2n (9.13b) 

Sh( X ) = 4 (V2n- 1 + 2y 2ra + V2n+l) (9.13c) 

possesses an asymptotic expansion in even powers of h and has satisfactory stabil¬ 
ity properties. This led to the construction of the very powerful G(ragg)-B(ulirsch)- 
S(toer)-extrapolation algorithm ...”. 

Gragg’s proof of this property was very long and complicated and it was again 
“a great achievement” that Stetter had the elegant idea of interpreting (9.13b) as a 
one-step algorithm by rewriting (9.13) in terms of odd and even indices: for this 
purpose we define 


h* = 2h, x* k =x 0 + kh*, u 0 =v 0 =y 0 , 

u k = V 2 v k = V 2 k +1 - hf(x 2k , y 2k ) = - (y 2k+1 + y 2k _ x ). 

Then the method (9.13) can be rewritten as (see Fig. 9.3) 


f u k+1 

\ v k +1 


f f( x k + ir’ v k + irf( x b u k)) \ 
V \ (. f( x t+ h *i u k+i) + f( x h u k))) 


(9.14) 


(9.15) 
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This method, which maps the pair (■ u k ,v k ) to (uk+n v k+i) * can seen f rom 
Fig. 9.3 to be symmetric. The symmetry can also be checked analytically (see Def¬ 
inition 8.7) by exchanging u k+1 u k , v fe+1 , ft* ^ -ft* , x\ + ft*. 
A trivial calculation then shows that this leaves formula (9.15) invariant. Method 
(9.15) is consistent with the differential equation (let ft* —» 0 in the increment 
function) 

u' = f(x,v) u(x n )=y n 

J\ > \ 0) y 0 (9. 16) 

v =f(x,u) v(x 0 ) = y 0 , 

whose exact solution is simply u(x) = v{x) = t/(x). Therefore, we have from The¬ 
orem 8.10 that 


£ 

y(x)-u h ,(x) = y2 a 2j( x )( h *) 2j +(h*) 2e+2 A(x,h*) (9.17a) 

3 = 1 

£ 

y( x ) - v h*(x) = b 2 j( x )( h *) 2j + ( h*) u+2 B(x , ft*) (9.17b) 

3 = 1 

and a 2j (x 0 ) = & 2 j(^o) = 0* We see f rom (9.14) and (9.17a) that y h (x) possesses 
an expansion in even powers of ft, provided that the number of steps is even; i.e., 
for x = x 0 + 2nh , 

i 

y(x) - y h (x) = y2,^2ji x ) h23 + ft 2<?+2 -4(z, ft) (9.18) 

3 = 1 

where a 2 -{x) = 2 2 -? a 2 j{x) and A(x, ft) = 2 2 ^+ 2 A(x, 2ft). 

The so-called smoothing step , i.e., formula 

^h(*0+2nft) = J (l/2n-l + 2 %n + y 2n+ l) = \ («» + O 

(see (9.13c) and (9.14)) had its historical origin in the “weak stability” of the ex¬ 
plicit midpoint rule (9.13b) (see also Fig. III.9.2). However, since the method is 
anyway followed by extrapolation, this step is not of great importance (Shampine 
& Baca 1983). It is a little more costly and increases the “stability domain” by 
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approximately the same amount (see Fig. IV.2.3 of Vol. II). Further, it has the ad¬ 
vantage of evaluating the function / at the end of the basic step. 

Theorem 9.2. Let f(x,y) G C 2 ^+ 2 , then the numerical solution defined in (9.13) 
possesses for x = x 0 +2nh an asymptotic expansion of the form 

£ 

y(x)-s h (x) = £%• (x)h 2j +h 2e+2 C(x,h) (9-19) 

i =i 

with e 2 j(x Q ) = 0 and C(x : h ) bounded for x 0 < x < x and 0 < h < h Q . 

Proof By adding (9.17a) and (9.17b) and using h* = 2 h we obtain (9.19) with 

e 2j( x ) = ( a 2j( x ) + b 2 jix)) 22 ^ 1 ■ □ 


This method can thus be used for Richardson extrapolation in the same way as 
symmetric methods above: we choose a step-number sequence, with the condition 
that the n- are even, i.e., 


2,4,8,16,32,64,128,256,... 

(9.6’) 

2,4,6,8,12,16,24,32,48,... 

(9.7’) 

2,4,6,8,10,12,14,16,18,... 

(9.8’) 


set 

T hl -.= S hi {x 0 + H) 

and compute the extrapolated expressions T ( J . based on the h 2 -expansion, by the 
Aitken-Neville formula (9.10). 

Numerical example. Fig. 9.4 represents the numerical results of this algorithm 
applied to Example (9.11) with step size H — 0.2. The step size sequences are 
Romberg (9.6’) (above), Bulirsch (9.7’) (middle), and harmonic (9.8’) (below). The 
algorithm with smoothing step (numerical work = 1 + n- + n j~\ + • • • + n -_ k+1 ) 
is represented left , the results without smoothing step (numerical work = 1 +n i - 
1 + rij_ x - 1 + ... + rij_ kx _ i - 1) are on the right. 

The results are nearly identical to those for the implicit midpoint rule (Fig. 9.2), 
but much more valuable, since here the method is explicit. In the pictures on the 
left the values for extrapolated Euler (from Fig. 9.1) are repeated as a “ghost” and 
demonstrate clearly the importance of the h 2 -expansion, especially in the diagonal 
T kk for large values of k. The ghost in the pictures on the right are the values with 
smoothing step from the left; the differences are seen to be tiny. 
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IO- 5 iQ-io 10-15 IQ - 20 10- 5 10- 10 10- 15 10- 20 


Fig. 9.4. Precision of h 2 -extrapolated Gragg method for Example (9.11) 


Asymptotic Expansion for Odd Indices 


For completeness, we still want to derive the existence of an h 2 expansion for 
y 2 k+i f rom (9.17b), although this is of no practical importance for the numerical 
algorithm described above. 

Theorem 9.3 (Gragg 1964). For x = x 0 + (2k+l)h we have 

£ 

y(x)-y h (x) = (. x)h 2j + h 2e+2 B(x, h) (9.20) 

3 =1 

where the coefficients b 2 j(x ) are in general different from those for even indices 
and b 2 j(x 0 ) 0. 

Proof y 2 k+i can be computed (see Fig. 9.3) either from v k by a forward step or 
from v k+1 by a backward step. For the sake of symmetry, we take the mean of 
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both expressions and write 

V2k+1 = \( v k+ v k+t) + \ {f( x l, u k)-f( x *k+l, u k+l))- 

We now subtract the exact solution and obtain 

2 (% 0 ) - y ( x )) = v 2h( x ~ h )~ v( x - h ) 

+ V 2h (x + h) - y(x + h) + y(x - h) - 2 y{x) + y(x + h) 

+ h(j(x-h, u 2h (x -h))-f(x + h, u 2h (x + h))j. 

Due to the symmetry of u 2h (x) (u 2h (g) = u_ 2h (^)) and of v 2h (x) the whole 
expression becomes symmetric in h. Thus the asymptotic expansion for y 2 k+i 
contains no odd powers of h. □ 


Both expressions, for even and for odd indices, can still be combined into a 
single formula (see Exercise 2). 


Existence of Explicit RK Methods of Arbitrary Order 


Each of the expressions T- k clearly represents an explicit RK-method (see Exer¬ 
cise 1). If we apply the well-known error formula for polynomial interpolation (see 
e.g., Abramowitz & Stegun 1964, formula 25.2.27) to (9.19), we obtain 

y(x 0 + H)- T jjk = ( ~ 1} 2 fc - e 2k (x 0 + H)H 2k + 0(H 2k+2 ). (9.21) 

U j ' ' ' • ' n j-k+l 
Since e k (x 0 ) = 0, we have 

y(x 0 + H)~ T- k = (_1 ^ - e 2k (x 0 )H 2k+1 + 0{H 2k+2 ). (9.22) 

This shows that T- k represents an explicit Runge-Kutta method of order 2k. As 
an application of this result we have: 

Theorem 9.4 (Gragg 1964). For p even, there exists an explicit RK-method of 
order p with s =p 2 / 4 + 1 stages. 


Proof. This result is obtained by counting the number of necessary function eval¬ 
uations of the GBS-algorithm using the harmonic sequence and without the final 
smoothing step. □ 
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Remark. The extrapolated Euler method leads to explicit Runge-Kutta methods 
with s = p(p — l)/2 + 1 stages. This shows once again the importance of the h 2 
expansion. 


Order and Step Size Control 


Extrapolation methods have the advantage that in addition to the step size also the 
order (i.e., number of columns) can be changed at each step. Because of this double 
freedom, the practical implementation in an optimal way is more complicated than 
for fixed-order RK-methods. The first codes were developed by Bulirsch & Stoer 
(1966) and their students. Very successful extrapolation codes due to R Deuflhard 
and his collaborators are described in Deuflhard (code DIFEX1, 1983). 

The choice of the step size can be performed in exactly the same way as for 
fixed-order embedded methods (see Section II.4). If the first k lines of the extrap¬ 
olation tableau are computed, we have T k k as the highest-order approximation (of 
order 2k by (9.22)) and in addition T k k _ 1 of order 2k — 2. It is therefore natural 
to use the expression 

err k = \\T k , k -i ~ T k , k II (9.23) 

for step size control. The norm is the same as in (4.11). As in (4.12) we get for the 
optimal step size the formula 

H k =H • 0.94 • (0.65/e/r fe ) 1/(2fe_1) (9.24) 


where this time we have chosen a safety factor depending partly on the order. 

For the choice of an optimal order we need a measure of work, which allows 
us to compare different methods. The work for computing Tkk can be measured 
by the number A k of function evaluations. For the GBS-algorithm it is given 
recursively by 


A 1 =n 1 + l 

A k = A k-i+ n k- 


(9.25) 


However, a large number of function evaluations can be compensated by a large 
step size H k , given by (9.24). We therefore consider 

w k = A (9.26) 

H k 

the work per unit step, as a measure of work. The idea is now to choose the order 
(i.e., the index A;) in such a way that W k is minimized. 

Let us describe the combined order and step size control in some more detail. 
We assume that at some point of integration the step size H and the index k (k> 2) 
are proposed. The step is then realized in the following way: we first compute 
k — 1 lines of the extrapolation tableau and also the values H k _ 2 , W k _ 2 , err k _ 1 , 

H k ,.U, ,. 
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a) Convergence in line k — 1. If err k _ 1 < 1, we accept T k _ lk _ x as numerical 
solution and continue the integration with the new proposed quantities 


h — 


TT _ 


k if W k _ i < 0.9 • W k _ 2 

. fc — 1 else 

if 


H k 

k new 


if 


^new < ^ - 1 

Z~ — k 
^new ^ • 


(9.27) 


In (9.27), the only non-trivial formula is the choice of the step size i7 new in the 
case of an order-increase k mw = k. In this case we want to avoid the computation 
of err k , so that H k and W k are unknown. However, since our k is assumed to be 
close to the optimal value, we have W k « W fe _i which leads to the proposed step 
size increase. 

b) Convergence monitor If err k _ x > 1, we first decide whether we may expect 
convergence at least in line k + 1. It follows from (9.22) that, asymptotically, 

. 2 


ll^fe fc-2 — T k k _ 1 1| ~ (~) 

' flu / 


<?rr 


fc-i 


(9.28) 


with err k _ 1 given by (9.23). Unfortunately, err k cannot be compared with (9.28), 
since different factors (depending on the differential equation to be solved) are 
involved in the asymptotic formula (cf. (9.22)). If we nevertheless assume that 
err k is (n 2 /n 1 ) 2 times smaller than (9.28) we obtain err k ~ ( n 1 /n k ) 2 err l 


k-1 


We 


therefore already reject the step at this point, if 

f n k+i n k Y 

V n 1 n 1 J 


err■ 


k-l 


> 


(9.29) 


and restart with k mw < k — 1 and H mw according to (9.27). If the contrary of 


(9.29) holds, we compute the next line of the extrapolation tableau, i.e., 
H k and W h . 


k,k 


err h 


c) Convergence in line k. If err k < 1, we accept T kk as numerical solution and 
continue the integration with the new proposed values 


k 

“'npw 


_ 



if 

if 


< 0.9 • W k 


W k < 0.9 • 


w. 


k-l 


H u 


in all other cases 
if 


(9.30) 


H k {A k+l /A k ) 


if 


^new — ^ 

^new k 1 • 


d) Second convergence monitor. If err k > 1, we check, as in (b), the relation 


err k > 



(9.31) 


If (9.31) is satisfied, the step is rejected and we restart with & new < k and 77 new of 
(9.30). Otherwise we continue. 
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e) Hope for convergence in line k +1. We compute err k+1 , H k+1 and W k+1 . 
If err k+1 < 1, we accept T k+1 k+1 as numerical solution and continue the integra¬ 
tion with the new proposed order 

&new := k 

if (W / fc _ 1 < 0.9 • W k ) k DSW :=k- 1 (9.32) 

if (W k+1 < 0.9-W kn J k new :=k + 1. 

If ^rr^, +1 > 1 the step is rejected and we restart with k mw < k and JJ new of (9.24). 
The following slight modifications of the above algorithm are recommended: 

i) Storage considerations lead to a limitation of the number of columns of the 
extrapolation tableau, say by k mSLX (e.g., & max = 9). For the proposed index k mw 
we require 2 < k nQW < /c max — 1. This allows us to activate (e) at each step. 

ii) After a step-rejection the step size and the order may not be increased. 
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Fig. 9.6. Solution, step size and order variation 
obtained by ODEX at the discontinuous example (9.33) 


Numerical study of the combined step size and order control. We show in 
the following examples how the step size and the order vary for the above algo¬ 
rithm. For this purpose we have written the FORTRAN-subroutine ODEX (see 
Appendix). 

As a first example we again take the Brusselator (cf. Section II.4). As in 
Fig. 4.1, the first picture of Fig. 9.5 shows the two components of the solution (ob¬ 
tained with Atol = Rtol = 10 -9 ). In the remaining two pictures we have plotted the 
step sizes and orders for the three tolerances 10 -3 (broken line), 10 -6 (dashes and 
dots) and 10 -9 (solid line). One can easily observe that the extrapolation code au¬ 
tomatically chooses a suitable order (depending essentially on Tol). Step-rejections 
are indicated by larger symbols. 
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We next study the behaviour of the order control near discontinuities. In the 
example 

y' = — sign (x) |l — \x \| t/ 2 , y{— 2) = 2/3, —2<x<2 (9.33) 

we have a discontinuity in the first derivative of y{x) at x = 0 and two discon¬ 
tinuities in the second derivative (at x = ±1). The numerical results are shown 
in Fig. 9.6 for three tolerances. In all cases the error at the endpoint is about 
10 • Tol. The discontinuities at x = ±1 are not recognized in the computations 
with Tol = 10 -3 and Tol = 10 -6 . Whenever a discontinuity is detected, the order 
drops to 4 (lowest possible) in its neighbourhood, so that these points are passed 
rather efficiently. 


Dense Output for the GBS Method 

Extrapolation methods are methods best suited for high precision which typically 
take very large (basic) step sizes during integration. The reasons for the need of a 
dense output formula (discussed in Section II.6) are therefore particularly important 
here. First attempts to provide extrapolation methods with a dense output are due to 
Lindberg (1972) for the implicit trapezoidal rule, and to Shampine, Baca & Bauer 
(1983) who constructed a 3rd order dense output for the GBS method. We present 
here the approach of Hairer & Ostermann (1990) (see also Simonsen 1990). 

It turned out that the existence of high order dense output is only possible if the 
step number sequence satisfies some restrictions such as 

n j+ 1 — n j : = 0 ( mod 4 ) for j = 1, 2, 3,... (9.34) 

which, for example, is fulfilled by the sequence 

{2, 6, 10, 14, 18, 22, 26, 30, 34,...} . (9.35) 

The idea is, once again, to do Hermite interpolation. To begin with, high order 
approximations are as usual at our disposal for the values y 0 , y f 0 , y x , y[ by using 
%’ f( x 0 -%)’ T kk > f( x o + H i T kk)’ where T kk is supposed to be the highest 
order approximation computed and used for continuation of the solution. 


j = 1, n, = 2 

j = 2, rij = 6 

j = 3, rij = 10 

j = 4, rij = 14 

j = 5, n. = 18 
J j 



O-^bO^O^O^O^O^O^T-O 


j = 6, rij = 22 

x 0 x 0 +H/ 2 Xq+H 


Fig. 9.7. Evaluation points for a GBS step 
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For more inspiration, we represent in Fig. 9.7 the steps taken by Gragg’s midpoint 
rule for the step number sequence (9.35). The symbols o and x indicate that the 
even steps and the odd steps possess a different asymptotic expansion (see Theorem 
9.3) and must not be blended. We see that, owing to condition (9.34), the midpoint 
values yff j 2 , obtained during the computation of T - x , all have the same parity and 
can therefore also be extrapolated to yield an approximation for y(x Q + H/ 2) of 
order 2 k — 1 (remember that in Theorem 9.3, b 2 j(x 0 ) 0). 

We next insert (9.20) for x = x 0 + H /2 into f(x,y) 

fnj/2 : = f ( X ’ Vn -/ 2 ) = /(*> 2 /(*) - • • •) 

and develop in powers of h ■ to obtain 

y'{x) - ff /2 = hp 21 (x) + hp 41 (x) + ... . (9.36) 

This shows that the /-values at the midpoint x 0 + H /2 (for j = 1,2,... k ) possess 
an asymptotic expansion and can be extrapolated k — 1 times to yield an approxi¬ 
mation to y f (x Q + Hj 2) of order 2k — 1. 

But this is not enough. We now consider, similar to an idea which goes back 
to the papers of Deuflhard & Nowak (1987) and Lubich (1989), the central dif¬ 
ferences Sf i = fi +1 fi- at the midpoint which, by Fig. 9.7, are available for 
j = 1, 2,..., k and are based on even parity. By using (9.18) and by developing 
into powers of h- we obtain 


S fn -/2 f( X + h j^n’/2+ 1 ) “ f( X ~ h j’ Vnj/2-l) 


,U) 


U) 


2 hj 


2 h j 


= f{x+h j ,y{x+h j ) - h^a^x+hj) - hp^x+hj) 

f{x-h j ,y{x-h j ) - hp 2 (x—hj) - hp^x-hj) -•••))/ 2 ^' 

hp 2 {x) - hp 4 {x) - ... . 


ypx+hp-ypx-hp ^ 


2 hj 


Finally we insert the Taylor series for y f (x+h) and y'(x—h ) to obtain an expansion 


y"{x) - 2 ^ /2 = hp 22 (x) + hp 42 (x) + ... . (9.38) 


Therefore, k — 1 extrapolations of the expressions (9.37) yield an approximation 
to y"(x 0 + iT/2) of order 2fc — 1. 

In order to get approximations to the third and fourth derivatives of the solution 
at x 0 + 77/2, we use the second and third central differences of which exist 
for j > 2 (Fig. 9.7). These can be extrapolated k — 2 times to give approximations 
of order 2k —3. 
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The continuation of this process yields the following algorithm: 

Step 1. For each j E {1,..., k }, compute approximations to the derivatives of y(pc) 
at x 0 + H/2 by: 


d 


(o) _ Jj) 


= y, t 


rij/2 


= 

a j 


Zk-1 fW) 

3 Jnj/2 

(2 hM- 1 


for K = l,...,2j . 


(9.39) 


Step 2. Extrapolate d ( j l> (k — 1) times and d >2< 1 ' 1 , d i2t> (k — £) times to obtain 
improved approximations d <K> to y^ K \x 0 + H/ 2). 

Step 3. For given p (—1 < p < 2k) define the polynomial P (0) of degree // + 4 
by 

P^) = y 0 , P^0)=Hf(x o ,y o ), 

P »(!) = > ^(1) = o + ff, T fefe ) (9.40) 

pW(l/2) = U' t dW for /.• 0.....//. 

This computation of P (0) does not need any further function evaluation since 
f(x 0 + H, T kk ) has to be computed anyway for the next step. Further, P (0) 
gives a global C 1 approximation to the solution. 


Theorem 9.5 (Hairer & Ostermann 1990). If the step number sequence satisfies 
(9.34), then the error of the dense output polynomial P (0) satisfies 


y(x 0 +6H) - P M (0) 


0(P 2/c+1 ) ifn 1 =4andp>2k—4: 
0(H 2k ) ifn 1 = 2 and p > 2k— 5. 


Proof Since P (0) is a polynomial of degree /i + 4 the error due to interpolation 
is of size 0(P>+ 5 ). This explains the restriction on p in (9.40). As explained 
above, the function value and derivative data used for Hermite interpolation have 
the required precision 


H K y M (x 0 +H/2) - H K d {K) 


0(H 2k ) if k = 0, 

0(H 2k+1 ) if n is odd, 

0(H 2k + 2 ) if k > 2 is even. 


In the case n 1 = 4 the parity of the central point x 0 +H/ 2 is even (in contrary to 
Fig. 9.7), we therefore apply (9.18) and gain one order because then the functions 
a- 0 (x) vanish at x 0 . □ 
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Control of the Interpolation Error 

At one time ... every young mathematician was familiar with sn u, 
cn u, and dn u, and algebraic identities between these functions 
figured in every examination. 

(E.H. Neville, Jacobian Elliptic Functions, 1944) 

Numerical example. We apply the above dense output formula with p = 2k — 3 (as 
is standard in ODEX) to the differential equations of the Jacobian elliptic functions 
sn, cn, dn (see Abramowitz & Stegun 1964, 16.16): 

y[ = y 2 y 3 2/1(0) = o 

Ih = -ViVs 2 / 2 ( 0 ) = 1 ( 9 - 41 ) 

2/3 = - 0 - 5 i • 2/12/2 2/3(0) = 1 

with integration interval 0 < x < 10 and error tolerance Atol = Rtol = 10 -9 . The 
error for the three components of the obtained continuous solution is displayed 
in Fig. 9.8 (upper picture; the ghosts are the solution curves) and gives a quite 
disappointing impression when compared with the precision at the grid points. We 
shall now see that these horrible bumps are nothing else than interpolation errors. 



0123456789 10 

Fig. 9.8. Error of dense output without/with interpolation control 


Assume that in the definition of P^(0) the basic function and derivative values 
are replaced by the exact values y(x 0 + H ), y'(x 0 + H ), and y( K ')(x 0 + H/2). 
Then the error of P (0) is given by 


o 2 (i-e) 2 (o 


lN^+l y(P+5)(g) 5 

2/ (fi + 5)! 


(9.42) 
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where £ E (x 0 , x 0 + H) (possibly different for each component). The function 
0 2 { 1 — 6) 2 {0 — 1/2) m+1 has its maximum at 


6 


/x+l 




(9.43) 


which, for large /i, are close to the ends of the integration intervals and indicate 
precisely the locations of the large bumps in Fig. 9.8. This demonstrates the need 
for a code which not only controls the error at the grid points, but also takes care 
of the interpolation error. To this end we denote by a the coefficient of 0 M+4 in 
the polynomial P^iO) and consider (Hairer & Ostermann 1992) 

P,(0) - P»- 1 W = - ef(e - (9.44) 


as an approximation for the interpolation error for P /1 _ 1 (0) and use 


errint=\\P ll (6 ll )-P ll _ 1 (6 ll )\\ 


(9.45) 


as error estimator (the norm is again that of (4.11)). Then, if errint > 10 the step 
is rejected and recomputed with 

H int = H (1/errint) 1/ifl+4) 

because errint = 0(iT M+4 ). Otherwise the subsequent step is computed subject to 
the restriction H < Hint- 

This modified step size strategy makes the code, together with its dense out¬ 
put, more robust. The corresponding numerical results for the problem (9.41) are 
presented in the lower graph of Fig. 9.8. 


Exercises 

1. Show that the extrapolated Euler methods T 3 x , T 3 2 , T 3 3 (with step-number 
sequence (9.8)) are equivalent to the Runge-Kutta methods of Table 9.1. Com¬ 
pute also the Runge-Kutta schemes corresponding to the first elements of the 
GBS algorithm. 


Table 9.1. Extrapolation methods as Runge-Kutta methods 




0 


0 


0 


1/2 

1/2 

1/2 

1/2 

1/3 

1/3 

1/3 

1/3 0 

1/3 

1/3 0 

2/3 

1/3 1/3 

2/3 

1/3 0 1/3 

2/3 

1/3 0 1/3 


1/3 1/3 1/3 

0-111 

0 -2 3/2 3/2 


T 31 order 1 T 3 ^ order 2 order 3 



242 II. Runge-Kutta and Extrapolation Methods 


2. Combine (9.18) and (9.19) into the formula (x = x 0 + kh) 

£ 

y(x)-y k = '^2( a 2j( x ) + (~ 1 ) k P 2 j(x)^h 2j +h 2i+2 E(x,h) 

3 = 1 

for the asymptotic expansion of the Gragg method defined by (9.13a,b). 

3. (Stetter 1970). Prove that for every real b (generally between 0 and 1) the 
method 

Vi = Vo + h(bf(x o, y 0 ) + (l-b)f(x v y 2 )j 

Vi+i = Vi -1 + h((l-b)f(x i _ 1 ,y i _ 1 ) + 2 bf(x t , y t ) + (1 - b)f(x i+1 , y i+1 )) 

possesses an expansion in powers of h 2 . Prove the same property for the 
smoothing step 

S h (x) = 1 {y 2 n + y 2 n-i + h ^-b)f(x 2n _ 1 ,y 2n _ 1 ) + hbf(x 2n ,y 2 n)y 

4. (Stetter 1970). Is the Euler step (9.13a) essential for an h 2 -expansion? Prove 
that a first order starting procedure 


yi=y 0 + h$(x 0 ,y 0 ,h) 


for (9.13a) produces an h 2 -expansion if the quantities 

V -1 =y 0 - h*(x 0 , y 0 , -h), y 0 , and y 1 satisfy (9.13b) for i = 0. 

5. Study the numerical instability of the extrapolation scheme for the harmonic 
sequence, i.e., suppose that the entries ^11 ’ ^21 ’ ^31 ‘ ‘ ‘ are disturbed with 
rounding errors e, —e, e, ... and compute the propagation of these errors into 
the extrapolation tableau (9.5). 

Result. Due to the linearity of the extrapolation scheme, we suppose the T ik 
equal zero and e = 1. Then the results for sequence (9.8’) are 

1 . 


- 1 . 

-1.67 







1 . 

2.60 

3.13 






- 1 . 

-3.57 

-5.63 

-6.21 





1 . 

4.56 

9.13 

11.94 

12.69 




- 1 . 

-5.55 

-13.63 

-21.21 

-25.35 

-26.44 



1 . 

6.54 

19.13 

35.01 

47.65 

54.14 

55.82 


- 1 . 

-7.53 

-25.63 

-54.31 

-84.09 

-105.64 

-116.30 

-119.03 

1 . 

8.53 

33.13 

80.13 

140.14 

195.34 

232.96 

251.10 


hence, for order 18, we lose approximately two digits due to roundoff errors. 




II.9 Extrapolation Methods 


243 


6. (Laguerre 1883*). If a 1? a 2 ,..., a n are distinct positive real numbers and 
r 1? r 2 , • • •, r n are distinct reals, then 



is invertible. 

(Polya & Szego 1925, Vol. II, Abschn. V, Problems 76-77*). Show by 
induction on n that, if the function g(t) = J27=i a J ri ^ as n distinct positive 
zeros, then g(t) = 0. By Rolle’s theorem the function 

7 n 

9(t)) =J2 a i( r i ~ r l) in_ri_1 

7=2 

has n — 1 positive distinct zeros and the induction hypothesis can be applied. 


We are grateful to our colleague J. Steinig for these references. 
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The Pleiades seem to be among the first stars mentioned in astronomical liter¬ 
ature, appearing in Chinese annals of 2357 B.C. ... 

(R.H. Allen, Star names, their love and meaning, 1899, Dover 1963) 

If you enjoy fooling around making pictures, instead of typesetting ordinary 
text, T E X will be a source of endless frustration/amusement for you, ... 

(D. Knuth, The T E Xbook, p. 389) 


Problems 


EULR — Euler’s equation of rotation of a rigid body (“Diese merkwiirdig sym- 
metrischen und eleganten Formeln ...”, A. Sommerfeld 1942, vol. I, § 26.1, Euler 
1758) 



A ?/[ - (4 A) 2/2 % 

A y'2 — (A - A) 2/32/1 ( 10 - 1 ) 

42/3 = (A^A) 2/12/2 + / 0 ) 

where y 1: y 2 ^ % are the coordinates of u;, the rotation vector, and / 1? / 2 , 1 3 are the 
principal moments of inertia. The third coordinate has an additional exterior force 

_ j 0.25 • sin 2 x if 37 t < x < 47t ^0 1’) 

10 otherwise 

which is discontinuous in its second derivative. We choose the constants and initial 
values as 


A = 0.5, I 2 = 2, / 3 = 3, 2/i (0) = 1, 2/ 2 (0)=0, 2/ 3 (0) = 0.9 




II. 10 Numerical Comparisons 


245 


(see Fig. 10.1) and check the numerical precision at the output points 
^end ^ and (£ end • 


AREN — the Arenstorf orbit (0.1) for the restricted three body problem with initial 
values (0.2) integrated over one period 0 < x < x end (see Fig. 0.1). The precision 
is checked at the endpoint, here the solution is most sensitive to errors of the initial 
phase. 

LRNZ — the solution of the Saltzman-Lorenz equations (1.16.17) displayed in 
Fig. 1.16.8, i.e., with constants and initial values 

* = 10, r = 28, 6= |, 2/ 1 (0) = -8, y 2 (0) = 8, %(0) = 27 . (10.2) 

The solution is, for large values of x, extremely sensitive to the errors of the first 
integration steps (see Fig. 1.16.10 and its discussion). For example, at x = 50 the 
numerical solution becomes totally wrong, even if the computations are performed 
in quadruple precision with Tol = 10 -20 . Hence the numerical results of all meth¬ 
ods would be equally useless and no comparison makes any sense. Therefore we 
choose 

^end — 

and check the numerical solution at this point. Even here, all computations with 
Tol > 10 -7 , say, fall into a chaotic cloud of meaningless results (see Fig. 10.5). 


PLEI — a celestial mechanics problem (which we call “the Pleiades”): seven stars 
in the plane with coordinates x i , y i and masses m i = i (i = 1,... ,7): 


x "=y2 m j( x j~ x i)/ r ij 

yf z X/"./ ( 4 y^l r a 


(10.3) 



The initial values are 


^i(0) = 3, 

x 2 (0) = 3, 

* 3 ( 0 ) = -!> 

x 4 ( 0 ) = 

^ 5 ( 0 ) = 2 , 

x 6 (0) = -2, 

x 7 (0) =2, 


2/i(0) = 3, 

CO 

1 

II 

§ 

CM 

2/3W = 2, 

II 

O 

%(o) = 0 , 

%(0) = -4, 

2 / 7 ( 0 ) = 4, 



x'(0) = y[{ 0) = 0, for all i with the exception of 

<(0) = 1.75, x' 7 (0) = -1.5, 4(0) = -1.25, 4(0) = 1, 


(10.4) 
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and we integrate for 0 < t < £ end = 3 . Fig. 10.2a represents the movement of these 
7 bodies in phase coordinates. The initial value is marked by an “i”, the final value 
at t = £ end is marked by an “f”. Between these points, 19 time-equidistant output 
points are plotted and connected by a dense output formula. There occur several 
quasi-collisions which are displayed in Table 10.1. 


Table 10.1. Quasi-collisions in the PLEI problem 


Body i 

1 

1 

3 

1 

2 

5 

Body 2 

7 

3 

5 

7 

6 

7 

r 2 - 

ij 

0.0129 

0.0193 

0.0031 

0.0011 

0.1005 

0.0700 

time 

1.23 

1.46 

1.63 

1.68 

1.94 

2.14 


The resulting violent shapes of the derivatives x[(t ), y[(t) are displayed in 
Fig. 10.2b and show that automatic step size control is essential for this example. 



Fig. 10.2a. Solutions of (10.3) Fig. 10.2b. Speeds 
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ROPE — the movement of a hanging rope (see Fig. 10.3a) of length 1 under grav¬ 
itation and under the influence of a horizontal force 

Fy ^ = (cosh(4t-2.5)) (1 °' 5a) 

acting at the point s = 0.75 as well as a vertical force 

F x (t) = 0.4 (10.5b) 

acting at the endpoint 5 = 1. 



Fig. 10.3a. Hanging rope Fig. 10.3b. Solution for 0 < t < 3.723. 


If this problem is discretized, then Lagrange theory (see (1.6.18); see also Ex¬ 
ercises IV. 1.2 and IV. 1.4 of Volume II) leads to the following equations for the 
unknown angles 0 k : 


^ a iJk = Y1 ~ n (n + - - lj sin 0 t 
k =1 k =1 


-n 2 sin 0 l ■ F x (t) + 


n 2 cos 0 t • F (t) 


hk = 9ik cos (Ot*0 k ), b lk = g lk sin(e i -0 k ), 


if l < 3n/4 
if l > 3n/4, 

1 

9ik = n +2 


( 10 . 6 ) 

/ = 1,..., n 

— max(l, k). 

(10.7) 


We choose 


n = 40 , 0i(0)=9 l (0) = 0, 


0 <t< 3.723. 


( 10 . 8 ) 
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The resulting system is of dimension 80. The special structure of G _1 (see 
(IV. 1.16-18) of Volume II) allows one to evaluate 0 t with the following algorithm: 

a) Let v t = —n (n+ \ — l) sin 6 t — n 2 sin 0 t • F x + j ^ cos ^ 1 

b) Compute w=Dv+6 2 , 

c) Solve the tridiagonal system Cu=w , 

d) Compute 6=Cv+Du , 
where 


/ 1 -cos(6» 1 -6» 2 ) 

I — COs(#2 — #1) 2 — COs(#2 — #3) 


<7 = 


—cos(9 3 —6 2 ) 


\ 


\ 


2 -cos(6» n _ 1 -6» n ) 

-cos(6» n -6» n _i) 3 / 


(10.9) 


/ 0 -sin(6' 1 -6' 2 ) 

—sin(6> 2 —6>i) 0 -sin(<9 2 -<9 3 ) 


£> = 


-sin(0 3 -0 2 ) 


\ 


V 


0 -sin(6»„_ 1 -6» n ) 

-sin(6»„-6» n _i) 0 ) 


BRUS — the reaction-diffusion equation (Brusselator with diffusion) 


du 

~dt 


= 1 + u 2 v — AAu + oi 


/ d 2 u d 2 u\ 
\dx 2 dy 2 ) 


dv n A 9 . 

—- = 3 Au — u v + a ( 

at 


/ d 2 v d 2 v\ 
\dx 2 dy 2 J 


( 10 . 10 ) 


for 0 < x < 1, 0<y<l 9 t> 0, a = 2- 10 -3 together with the Neumann boundary 
conditions 

du dv 

— = 0 , — = 0 , ( 10 . 11 ) 

an an 

and the initial conditions 

u(x, y , 0) = 0.5 + y , v(x, y, 0) = 1 + 5x . (10.12) 


By the method of lines (cf. Section 1.6) this problem becomes a system of ordinary 
differential equations. We put 

i — 1 j — 1 

^ = TV —1’ ^ = iV-1 





and define 
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U ij (t)=u(x i ,y j ,t), V ij (t)=v(x i ,y j ,t) . (10.13) 

Discretizing the derivatives in (10.10) with respect to the space variables we obtain 
for i,j = 1,... .TV 

U[j = l+V? j V ij -±.W ij +a{N-\) 2 (u^+U^+U^+U^- 4£/y) 

Vlj = iAU ir U^V ij+ a{N- 1) 2 (y. +lj ,+y. , +1+ F. ._ 1 -4^.), 

(10.14) 

an ODE of dimension 2N 2 . Because of the boundary condition (10.11) we have 

U 0 ,j = U 2j j ? U N+ 1 j = U N _ X j , u i ,o = U^2 i U i ^ N+1 = 

and similarly for the -quantities. We choose iV = 21 so that the system is 
of dimension 882 and check the numerical solutions at the output point £ end §p 
7.5. The solution of (10.14) (in the (x, y) -space) is represented in Fig. 10.4a and 
Fig. 10.4b for u and v respectively. 


Performance of the Codes 

Several codes were applied to each of the test problems with Tol = 10 -3 , Tol = 
10 -3-1 / 8 , Tol = 10 -3-2 / 8 , Tol = 10 -3-3 / 8 ,... (for the large problems with Tol = 
10 -3 , Tol — 10 -3-1 / 4 , Tol = 10 -3-2 / 4 ,...) up to, in general, Tol = 10 —14 , then 
the numerical result at the output points were compared with an “exact solution” 
(computed very precisely in quadruple precision). Each of these results then cor¬ 
responds to one point of Fig. 10.5, where this precision is compared (in double 
logarithmic scale) to the number of function evaluations. The “integer” tolerances 
10 -3 , 10 -4 , 10 -5 ,... are distinguishable as enlarged symbols. All codes were 
applied with complete “standard” parameter settings and were not at all “tuned” to 
these particular problems. 

A comparison of the computing time (instead of the number of function eval¬ 
uations) gave no significant difference. Therefore, only one representative of the 
small problems (FRNZ) and one large problem (BRUS) are displayed in Fig. 10.6. 
All computations have been performed in REAF*8 ( Uround = 1.11 • 10 -16 ) on a 
Sun Workstation (SunBlade 100). 

The codes used are the following: 

RKF45 — symbol — a product of Shampine and Watts’ programming art 
based on Fehlberg’s pair of orders 4 and 5 (Table 5.1). The method is used in the 
“local extrapolation mode”, i.e., the numerical solution is advanced with the 5th 
order result. The code is usually, except for low precision, the slowest of all, which 
is explained by its low order. The results of the “time”-picture Fig. 10.6 for this 
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Fig. 10.5. Precision versus function calls 
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Fig. 10.6. Precision versus computing time 


code are relatively better than those on the “function calls” front (Fig. 10.5). This 
indicates that the code has particularly small overhead. 

DOPRI5 — symbol O — the method of Dormand & Prince of order 5 with 
embedded error estimator of order 4 (see Table 5.2). The code is explained in the 
Appendix. The method has precisely the same order as that used in RKF45, but the 
error constants are much more optimized. Therefore the “error curves” in Fig. 10.5 
are nicely parallel to those of RKF45, but appear translated to the side of higher 
precision. One usually gains between a half and one digit of numerical precision for 
comparable numerical work. The code performs specially well between Tol = 10 -3 
and Tol = 10 -8 in the AREN problem. This is simply due to an accidental sign 
change of the error for the most sensitive solution component. 

DVERK — symbol 12 — this widely known code implements Verner’s 6 th order 
method of Table 5.4 and was written by Hull, Enright & Jackson. It has been in¬ 
cluded in the IMSL library for many years and the source code is available through 
na-net. The corresponding error curves in Fig. 10.5 appear to be less steep than 
those of DOPRI5, which illustrates the higher order of the method. However, the 
error constants seem to be less optimal so that this code surpasses the performance 
of DOPRI5 only for very stringent tolerances. It is significantly better than DOPRI5 
solely in problems EULR and ROPE. The code, as it was, failed at the BRUS prob¬ 
lem for Tol = 10 -3 and Tol = 10 -4 . Therefore these computations were started 
with Tol = 10 -5 . 

DOP853 — symbol O — is the method of Dormand & Prince of order 8 ex¬ 
plained in Section II.5 (formulas (5.20) - (5.30), see Appendix). The 6th order 
error estimator (5.29), (5.30) has been replaced by a 5th order estimator with 3rd 
order correction (see below). This was necessary to make the code robust for the 
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EULR problem. The code works perfectly for all problems and nearly all toler¬ 
ances. Whenever more than 3 or 4 digits are desired, this method seems to be 
highly recommendable. The most astonishing fact is that its use was never disas¬ 
trous, even not for Tol = 10 -3 . 

ODEX —symbol A — is an extrapolation code based on the Gragg-Bulirsch- 
Stoer algorithm with harmonic step number sequence (see Appendix). This 
method, which allows arbitrary high orders (in the standard version of the code 
limited to p < 18) is of course predestined for computations with high precision. 
The more stringent Tol is, the higher the used order becomes, the less steep the 
error curve is. This can best be observed in the picture for the ROPE problem. 
Finally, for Tol « 10 -12 , the code surpasses the values of DOP853. As can be seen 
in Fig. 10.6, the code loses slightly on the “time”-front. This is due to the increased 
overhead of the extrapolation scheme. 

The numerical results of ODEX behave very similarly to those of DIFEX1 
(Deuflhard 1983). 


A “Stretched” Error Estimator for DOP853 


In preliminary stages of our numerical tests we had written a code “DOPR86” 
based on the method of order 8 of Dormand & Prince with the 6th order error 
estimator described in Section II.5. For most problems the results were excellent. 
However, there are some situations in which the error control of DOPR86 did not 
work safely: 

When applied to the BRUS problem with Tol = 10 -3 or Tol = 10 -4 the code 
stopped with an overflow message. The reason was the following: when the step 
size is too large, the internal stages are too far away from the solution and their 
modulus increases at each stage (e.g., by a factor 10 5 between stage 11 and stage 
12). Due to the fact that b 12 = b 12 (see (5.30) (5.26) and (5.25b)) the difference 
Vi ~ Vi * s not influenced by the last stage and is smaller (by a factor of 10 5 ) than 
the modulus of y 1 . Hence, the error estimator scaled by (4.10) is < 10 -5 and a 
completely wrong step will be accepted. 

The code DOPR86 also had severe difficulties when applied to problems with 
discontinuities such as EULR. The worst results were obtained for the problem 

yi = y 2 J/3 Vi(o) = o 

y 2 = 3 / 3 J /1 y 2 (0) = 1 (10.15) 

y' z = -0.bl-y l y 2 + f(x) y 3 (0) = l 

where f(x ), given in (10.1’), has a discontinuous second derivative. The re¬ 
sults for this problem and the code DOPR86 for very many different Tol values 
(Tol = 10“ 3 ,10“ 3 “ 1/24 ,10“ 3_2/24 ,..., 10“ 14 ) are displayed in Fig. 10.7. There, 
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10 3 


10“ 3 10“ 6 10- 9 10- 12 10- 15 10- 3 10- 6 10- 9 10- 12 10- 15 
Fig. 10.7. Performances of DOPR86 and DOP853 at (10.15) 

the (dotted) diagonal is of exact slope 1/8 and represents the theoretical conver¬ 
gence speed of the method of order 8. It can be observed that this convergence is 
well attained by some results, but others lose precision of up to 8 digits from the de¬ 
sired tolerance. We explain this disappointing behaviour by the fact that b 12 = b 12 
and that the 12 th stage is the only one where the function is evaluated at the end¬ 
point of the step. Whenever the discontinuity of f" is by accident slightly to the 
left of a grid point, the error estimator ignores it and the code reports a wrong value. 

Unfortunately, the basic 8 th order method does not possess a 6 th order embed¬ 
ding with b 12 ^b 12 (unless additional function evaluations are used). Therefore, 
we decided to construct a 5 th order approximation y l . It can be obtained by taking 
b 6 , b 7 , b 12 as free parameters, e.g., 

^6 = = 6 7 /2 + 0.45, bi 2 = b 12 /2, 

by putting b 2 = b 3 = 6 4 = b 5 = 0 and by determining the remaining coefficients 
such that this quadrature formula has order 5. Due to the simplifying assumptions 
(5.20) all conditions for order 5 are then satisfied. In order to prevent a serious 
over-estimation of the error, we consider a second embedded method y 1 of order 
3 based on the nodes c x = 0, c 9 and c 12 = 1 so that two error estimators 

err 5 = \\y 1 -y 1 \\ = 0(h 6 ), e rr 3 = \\y 1 ~y 1 \\ = 0{h 4 ) (10.16) 




are available. Similarly to a procedure which is common for quadrature formulas 
(R. Piessens, E. de Doncker-Kapenga, C.W. Uberhuber & D.K. Kahaner 1983, 
Berntsen & Espelid 1991) we consider 


err = err 5 • 5 = = 0(h 8 ) 

+ 0.01 


(10.17) 


as error estimator. It behaves asymptotically like the global error of the method. 
The corresponding code DOP853 gives satisfactory results for all the above prob¬ 
lems (see right picture in Fig. 10.7). 
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Effect of Step-Number Sequence in ODEX 

We also study the influence of the different step-number sequences to the perfor¬ 
mance of the extrapolation code ODEX. Fig. 10.8 presents two examples of this 
study, a small problem (AREN) and a large problem (ROPE). The used sequences 
are 

HARMONIC — symbol O — the harmonic sequence (9.8’) which is the standard 
choice in ODEX; 

MOD4 — symbol A — the sequence {2, 6,10,14,18,...} (see (9.35)) which 
allowed the construction of high-order dense output; 

BULIRSCH — symbol □ — the Bulirsch sequence (9.7’); 

ROMBERG — symbol O — the Romberg sequence (9.6’); 

DNSECTRL — symbol ^— the error control for the MOD4 sequence taking into 
account the interpolation error of the dense output solution (9.42). This is included 
only in the small problem, since (complete) dense output on large problems would 
need too much memory. 



Fig. 10.8. Effect of step-number sequences in ODEX 


Discussion. With the exception of the clear inferiority of the Romberg sequence, 
especially for high precision, and a certain price to be paid for the dense output error 
control, there is not much difference between the first three sequences. Although 
the harmonic sequence appears to be slightly superior, the difference is statistically 
not very significant. 
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We suppose that we have a computer with a number of arithmetic 
processors capable of simultaneous operation and seek to devise 
parallel integration algorithms for execution on such a computer. 

(W.L. Miranker & W. Liniger 1967) 

“PARALYSING ODES” (K. Burrage, talk in Helsinki 1990) 


Parallel machines are computers with more than one processor and this facility 
might help us to speed up the computations in ordinary differential equations. This 
is particularly interesting for very large problems, for very costly function evalua¬ 
tion, or for fast real-time simulations. A second motivation is the desire to make a 
code, with the help of parallel computations, not necessarily faster but more robust 
and reliable. 

Early attempts for finding parallel methods are Nievergelt (1964) and Miranker 
& Liniger (1967). See also the survey papers Miranker (1971) and Jackson (1991). 

We distinguish today essentially between two types of parallel architectures: 

SIMD (single instruction multiple data): all processors execute the same in¬ 
structions with possibly different input data. 

MIMD (multiple instruction multiple data): the different processors can act 
independently. 

The exploitation of parallelism for an ordinary differential equation 

y' = f(x,y), y(x 0 ) = y 0 (11.1) 

can be classified into two main categories (Gear 1987, 1988): 

Parallelism across the system. Often the problem itself offers more or less trivial 
applications for parallelism, e.g., 

> if several solutions are required for various initial or parameter values; 

> if the right-hand side of (11.1) is very costly, but structured in such a way that 
the computation of one function evaluation can be split efficiently across the 
various processors; 

> space discretizations of partial differential equations (such as the Brusselator 
problem (10.14)) whose function evaluation can be done simultaneously for all 
components on an SIMD machine with thousands of processors; 

> the solution of boundary value problems with the multiple shooting method 
(see Section 1.15) where all computations on the various sub-intervals can be 
done in parallel; 



258 II. Runge-Kutta and Extrapolation Methods 


> doing all the high-dimensional linear algebra in the Runge-Kutta method (11.2) 
in parallel; 

> parallelism in the linear algebra for Newton’s method for implicit Runge-Kutta 
methods (see Section IV.8). 

These types of parallelism, of course, depend strongly on the problem and on the 
type of the computer. 

Parallelism across the method. This is problem-independent and means that, due 
to a special structure of the method, several function values can be evaluated in 
parallel within one integration step. This will be discussed in this section in more 
detail. 


Parallel Runge-Kutta Methods 


... it seems that explicit Runge-Kutta methods are not facilitated 
much by parallelism at the method level. 

(Iserles & Nprsett 1990) 


Consider an explicit Runge-Kutta method 


h = f (%o + Cih, y 0 + h^2 a ij k j ), i = l,...,s 

3 = 1 
s 

Vl = Vo + h b i k i- 

i=1 


( 11 . 2 ) 


Suppose, for example, that the coefficients have the zero-pattern indicated in 
Fig. 11.1. 


0 


X 

X 

X 

o 

X 

X 

XXX 


X X X X 


Fig. 11.1. Parallel method 



Each arrow in the corresponding “production graph” G (Fig. 11.2), pointing from 
vertex “i” to vertex 66 j ”, stands for a non-zero a rjl . Here the vertices 2 and 3 are 
independent and can be evaluated in parallel. We call the number of vertices in 
the longest chain of successive arrows (here 3) the number of sequential function 
evaluations cr. 
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In general, if the Runge-Kutta matrix A can be partitioned (possibly after a 
permutation of the stages) as 


\ 

, (11.3) 

^Arl At 2 At, <7-1 0/ 

where A-• is a matrix of size ll- x ll ■ , then the derivatives fc-...... fc lf as well 

as fc , ^ Ml+/X2 , and so on, can be computed in parallel and one step of the 

method is executed in a sequential function evaluations (if p = max • p i processors 
are at disposal). The following theorem is a severe restriction on parallel methods. 
It appeared in hand-written notes by K. Jackson & S. Nprsett around 1986. For a 
publication see Jackson & Nprsett (1992) and Iserles & Nprsett (1990). 


A : 


( ° 

A 2 i 0 

Ai A2 


Theorem 11.1. For an explicit Runge-Kutta method with a sequential stages the 
order p satisfies 

p<a, (11.4) 


for any number p of available processors. 

Proof Each non-zero term of the expressions <F-(t) for the “tall” trees t 21 , t 32 , 
^ 44 , f 59 , • • • (see Table 2.2 and Definition 2.9) J] a ij a jk a kt a im • • • corresponds to 
a connected chain of arrows in the production graph. Since their length is limited 
by <t , these terms are all zero for g(t) > a . □ 


Methods with p = a will be called P-optimal methods. The Runge-Kutta meth¬ 
ods of Section II. 1 for p < 4 are all P-optimal. Only for p > 4 does the subsequent 
construction of P-optimal methods allow one to increase the order with the help of 
parallelism. 

Remark. The fact that the “stability function” (see Section IV.2) of an explicit 
parallel Runge-Kutta method is a polynomial of degree < a allows a second proof 
of Theorem 11.1. Further, P-optimal methods all have the same stability function 

1 + z + z 1 1 2! + ... + z a /a\. 


Parallel Iterated Runge-Kutta Methods 

One possibility of constructing P-optimal methods is by fixed point iteration. Con¬ 
sider an arbitrary (explicit or implicit) Runge-Kutta method with coefficients 
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c = (c 1 ,...,c s ) T , A = ( aij y ij= 1 , b T = (b 1 ,...,b s ) 

and define ^ by 

t< 0, =o 

V y (11.5) 

j'i=%+ / i 52^ fc i <T) - 

This algorithm can be interpreted as an explicit Runge-Kutta method with scheme 


0 

0 





c 

A 

0 




c 

0 

A 

0 



c 

0 


0 

A 

0 


0 


0 

0 

b T 


It has a sequential stages if 5 processors are available. To compute its order we 
use a Lipschitz condition for f(x,y) and obtain 

max || kff* — fcjJ < Ch • max ||fc^ ^ — kf \| 

z z 

where k i are the stage-vectors of the basic method. Since k—k i = 0{\) this 
implies k\ a ^ —k i = 0(/i a ) and consequently the difference to the solution of the 
basic method satisfies y 1 —y 1 = 0(h a+1 ) . 

Theorem 11.2. The parallel iterated Runge-Kutta method (11.5) is of order 

p = min(p 0 , <t), (11.7) 

if p 0 denotes the order of the basic method. 

Proof The statement follows from 

V\ - y{x o + h) =y 1 -y 1 +y 1 - y(x 0 + h) = 0(h a+1 ) + O(h po+1 ) . □ 


This theorem shows that the choice a = p 0 in (11.5) yields P-optimal explicit 
Runge-Kutta methods (i.e., a = p). If we take as basic method the s -stage col¬ 
location method based on the Gaussian quadrature (p 0 = 2s) then we obtain a 
method of order p = 2s which is P-optimal on s processors. P.J. van der Houwen 
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& B.P. Sommeijer (1990) have done extensive numerical experiments with this 
method. 

Extrapolation Methods 

It turns out that the GBS-algorithm (Section II.9) without smoothing step is also P- 
optimal. Indeed, all the values T ii can be computed independently of each other. If 
we choose the step number sequence {2,4, 6,8,10,12,...} then the computation 
of T kl requires 2k sequential function evaluations. Hence, if k processors are 
available (one for each T JX \ the numerical approximation T kk , which is of order 
p — 2k, can be computed with a = 2k sequential stages. When the processors are of 
type MIMD we can compute and T k _ x 1 on one processor (2 + 2(k — 1) = 2k 
function evaluations). Similarly, T 21 and T k _ 2 x occupy another processor, etc. 
In this way, the number of necessary processors is reduced by a factor close to 2 
without increasing the number of sequential stages. 

The order and step size strategy, discussed in Section II.9, should, of course, be 
adapted for an implementation on parallel computers. The “hope for convergence 
in line k + 1 ” no longer makes sense because this part of the algorithm is now as 
costly as the whole step. Similarly, there is no reason to accept already T k _ 1 k _ x 
as numerical approximation, because T kk is computed on the same time level as 
T k _i k _ x . Moreover, the numbers A k of (9.25) should be replaced by A k = n k 
which will in general increase the order used by the code. 


Increasing Reliability 


... using parallelism to improve reliability and functionality 
rather than efficiency. (W.H. Enright & D.J. Higham 1991) 


For a given Runge-Kutta method parallel computation can be used to give a reliable 
error estimate or an accurate dense output. This has been advocated by Enright & 
Higham (1991) and will be the subject of this subsection. 

Consider a Runge-Kutta method of order p , choose distinct numbers 0 = cr 0 < 
<t 1 < ... < a k p 1 and apply the Runge-Kutta method in parallel with step sizes 
o x h ,..., cr k _ 1 h 1 a k h = h. This gives approximations 

Vat ~ y(x 0 + <?ih) ■ ( 11 . 8 ) 

Then compute f(x 0 + cr^, y a . ) and do Hermite interpolation with the values 

Vcn’ h f( x 0 + a i h ^yai) > * = 0, 1, . . . , fe, (11.9) 

i.e., compute 

k k 

U ( 9 ) = J2 V i W y<U + h J2 W i ^ /(*0 + a i h > Vat ) 

i =0 i— 0 


( 11 . 10 ) 
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where v^O) and w^O) are the scalar polynomials 


Vi(9) = ' (! - 2 K( a i)( d ~ a i)) 

w i(°) = ■ (Q ~ °i) 


with U0)= n 


( 0 - 


Li 

3 = 0 V 1 




( 11 . 11 ) 


The interpolation error, which is 0(h 2k + 2 ), may be neglected if 2k + 2 > p + 1. 

As to the choice of cr • we denote the local error of the method by le = y 1 — 
y(x 0 + h). It follows from Taylor expansion (see Theorem 3.2) that 

y<r, -y ( x o + °i h ) = °f +1 ■ le + o(h p+2 ) 

and consequently the error of (11.10) satisfies (for 2k + 2 > p + 1) 

k 

u{6) —y{xQ + 0h) = fy^qf +1 ^(fl)) • le + 0{h p+2 ). (11.12) 

i =1 


The coefficient of Ze is equal to 1 for 6 = 1 and it is natural to search for suitable 
a- such that 


Y crf +1 ^(<9) < 1 for all 0 G [0,1] 


i==l 


(11.13) 


Indeed, under the assumption 2/c — 1 <p< 2fc + 1, it can be shown that num¬ 
bers 0 = cr 0 < <J 1 < ... < cr /c _ 1 < <i k — 1 exist satisfying (11.13) (see Exercise 1). 
Selected values of cr- proposed by Enright & Higham (1991), which satisfy this 
condition are given in Table 11.1. For such a choice of cr i the error (11.12) of 
the dense output is bounded (at least asymptotically) by the local error le at the 
endpoint of integration. This implementation of a dense output provides a simple 
way to estimate le. Since u(0) is an (D(h p+1 ) -approximation of y(x Q + Oh) , the 
defect of u{0) satisfies 

k 

u'(9)-hf(x o + dh,u(0)) = (J2°i +1 v' i (0))-le + 0(h p+2 ) . (11.14) 

If we take a cr* different from cr. such that Yli=i cr f +1 ^( cr *) 7^ 0 ( see Table 11.1) 
then only one function evaluation, namely f(x Q + cr*/i, u(cr*)) , allows the compu¬ 
tation of an asymptotically correct approximation of le from (11.14). This error 
estimate can be used for step size selection and for improving the numerical re¬ 
sult (local extrapolation). In the local extrapolation mode one then loses the C 1 
continuity of the dense output. 

With the use of an additional processor the quantities y a * and f(x Q + cr* Zi, y a *) 
can be computed simultaneously with y and f(x 0 + cr -/i, y ). If the polynomial 
u(0) is required to satisfy n(cr*) = y a * , but not u f (a*) = hf(x 0 + cr*/i, y a *) 9 then 
the estimate (11.14) of the local error le does not need any further evaluation of /. 
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Table 11.1. Good values for cfi 


p 

k 

ai,... ,cr k _i 

* 

a 

5 

3 

0.2, 0.4 

0.88 

6 

3 

'si- 

o 

c4 

d 

0.88 

7 

4 

0.2, 0.4, 0.7 

0.94 

8 

4 

0.2, 0.4, 0.6 

0.93 


Exercises 

1. Let the positive integers k and p satisfy 2k — l<p<2k + l. Then show that 
there exist numbers 0 = a n < a, < ... < o h _, < o h = 1 such that (11.13) is 
true for all 6 E [0,1]. 

Hint. Put a ■— je for j = 1,..., k — 1 and show that (11.13) is verified for 
sufficiently small e > 0. Of course, in a computer program, one should use a- 
which satisfy (11.13) and are well separated in order to avoid roundoff errors. 



11.12 Composition of B-Series 


At the Dundee Conference in 1969, a paper by J. Butcher was read 
which contained a surprising result. (H.J. Stetter 1971) 


We shall now derive a theorem on the composition of what we call B-series (in 
honour of J. Butcher). This will have many applications and will lead to a better 
understanding of order conditions for all general classes of methods (composition 
of methods, multiderivative methods of Section 11.13, general linear methods of 
Section III.8, Rosenbrock methods in Exercise 2 of Section IV.7). 


Composition of Runge-Kutta Methods 


There is no five-stage explicit Runge-Kutta method of order 5 (Section II.5). This 
led Butcher (1969) to the idea of searching for different five-stage methods such 
that a certain composition of these methods produces a fifth-order result (“effective 
order”). Although not of much practical interest (mainly due to the problem of 
changing step size), this was the starting point of a fascinating algebraic theory of 
numerical methods. 

Suppose we have two methods, say of three stages, 


0 

0 


c 2 

a 21 C 2 

a 21 

c 3 

a 31 a 32 C 3 

a 31 a 32 


b 2 b 3 

b x b 2 b 3 


which are applied one after the other to a starting value y 0 with the same step size: 


9i = y 0 + h J2%f(9j), 

3 

3 


vv' y 0 + h Y^bjf(9j) 

3 

y 2 = y 1 +hY^b j f{e j ). 

j 


( 12 . 2 ) 

(12.3) 


If we insert y 1 from (12.2) into (12.3) and group all g i , L t together, we see that the 
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composition can be interpreted as a large Runge-Kutta method with coefficients 


0 






0 






C2 

021 





C2 

021 





C3 

«31 

«32 




c 3 

031 

032 




E bi 

bi 


h 



C 4 

041 

042 

043 


(12.4) 

E b i+C2 

bi 

b 2 

h 

021 


C5 

051 

052 

053 

054 


E b i + C3 

bi 

b 2 

b 3 

031 

032 

= C 6 

061 

062 

063 

064 

a 65 


h 

b 2 

h 

h 

^2 bs 


bi 

b 2 

b 3 

&4 

65 b$ 


It is now of interest to study the order conditions of the new method. For this, we 
have to compute the expressions (see Table 2.2) 

J2 b i' 2 E^’ 3 E 6 i C i’ 6 E b i a ij C ji etC - 

If we insert the values from the left tableau of (12.4), a computation, which for low 
orders is still not too difficult, shows that these expressions can be written in terms 
of the corresponding expressions for the two methods (12.1). We shall denote these 
expressions for the first method by a (t) , for the second method by b (t) , and for 
the composite method by ab(t): 

a (.) = EE a (/) = 2 -E b A, a(v) = 3-£6 i cf, ... 

b (-) = E b i, b (/) = 2 'E^c i , b(v)=3-E^cf, ••• 
ab (.) = EE ab(/) = 2-X>iC;, ab(v) • EVE ••• 

The above mentioned formulas are then 

ab(. ) =a( . ) • b( . ) 
ab(/) =a(/) + 2b( . )a( . ) + b(/) 
ab(v) = a(v) +3b( . )a( . ) 2 + 3b(/)a( . ) + b(v) 
ab(» = a(» + 3b( . )a(/)+ 3b(/)a( . )+b(» 
etc. 

It is now, of course, of interest to have a general understanding of these for¬ 
mulas for arbitrary trees. This, however, is not easy in the above framework (“... 
a tedious calculation shows that ...”). Further, there are problems of identifying 
different methods with identical numerical results (see Exercise 1 below). Also, we 
want the theory to include more general processes than Runge-Kutta methods, for 
example the exact solution or multi-derivative methods. 


(12.5a) 

(12.5b) 

(12.5c) 


( 12 . 6 ) 
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B-Series 


All these difficulties can be avoided if we consider directly the composition of the 
series appearing in Section II.2. We define by 

T = {0} U T x U T 2 U ..., LT = {0} U LT X U LT 2 U ... 

the sets of all trees and labelled trees, respectively. 


Definition 12.1 (Hairer & Wanner 1974). Let a(0), a( . ), a(/), a(v),... be 
a sequence of real coefficients defined for all trees a : T —> M. Then we call the 
series (see Theorem 2.11, Definitions 2.2, 2.3) 


B{ a, y) = a(0)y + ha(. )f(y) + — a( /)F(/)(y) + ... 
h 6 ^ h e W 

= T ^yy a W-F 1 W(2/) = y]^yya(i) a ( i )- F ( i )(2/) 

t^LT t€zT 


(12.7) 


a B-series. 


We have seen in Theorems 2.11 and 2.6 that the numerical solution of a Runge- 
Kutta method as well as the exact solution are B-series. The coefficients of the latter 
are all equal to 1. 

Usually we are only interested in a finite number of terms of these series (only 
as high as the orders of the methods under consideration, or as far as / is differen¬ 
tiable) and all subsequent results are valid modulo error terms 0{h k+1 ). 


Definition 12.2. Let t G LT be a labelled tree of order q = g{t) and 0 < i < q 
be a fixed integer. Then we denote by s i (t) = s the subtree formed by the first i 
indices and by d i (t) (the difference set ) the set of subtrees formed by the remaining 
indices. In the graphical representation we distinguish the subtree s by fat nodes 
and doubled lines. 


Example 12.3. For the labelled tree t = p * 




i = 0 
i — 1 
i = 2 
i = 3 
i = 4 
i = 5 


5 0 (t) =0, 

s 1 (t) = ., 

s 2 (t)=s, 

l 

k s 3 (t) =v, 

S 4 (t) =•>, 


k s 5 (i) = i=^, 


we have: 

d 0 {t) = {m>} 
d 1 (t) = {. ,. ,/} 
d 2 (t) = { . , . , . } 

< 40 ) = { • > • } 

d 4 (t) = { . } 

d S) = 0 
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Definition 12.4. Let a : T —> R and b : T —> R be two sequences of coefficients 
such that a(0) = 1 . Then for a tree t of order q = g(t) we define the composition 

ah{t)= w) s(e (-W^) n a K) ^ i2 - 8 > 

w i =0 V 7 zedi(t) 

where the first summation is over all a(t) different labellings of t (see Definition 

2.5). 

Example 12.5. It is easily seen that the formulas of (12.6) are special cases of 
(12.8). The tree t of Example 12.3 possesses 6 different labellings 

\t>' *•# M>" '\J>” \$* 

j J J J J j 

These lead to 

ab(^) = b(0)a(^>) + 5b(. )a( . ) 2 a(/) 

+ 10(lb(/)a( . )a(/) + 1 b(/)a( . ) 3 ) 

+ 10(1 b(v)a(/) + i b(v)a( . ) 2 + 1 b(»a( . ) 2 ) ' 

+ 5(ib(T)a(.) + lbWa(.))+b(^). 


Here is the main theorem of this section: 

Theorem 12.6 (Hairer & Wanner 1974). As above, let a : T — > M and b : T — > R 
be two sequences of coefficients such that a(0) = 1. Then the composition of the 
two corresponding B-series is again a B-series 

B(b, B(&, y)) = B(&b, y) (12.10) 

where the “product” ab : T —> M is that of Definition 12.4. 

Proof. We denote the inner series by 

B(a,y) = g(h). (12.11) 

Then the proof is similar to the development of Section II.2 (see Fig. 2.2), with the 
difference that, instead of f(g ), we now start from 

_ hQ(s) 

B(b,g) = E 7 u b (s)ns)(j) (12.12) 

seLT g ^ s> - 

and have to compute the derivatives of this function: let us select the term s = y 
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of this series, 

/? 3 

■^■ b (>) f*(g)fM(9)f M (g)- (12.13) 

L,M 

The q th derivative of this expression, for h — 0, is by Leibniz’ formula 

(f) b (» £(/* (9)f^(g)f M (g)) iq ~ 3) \ h=0 . ( 12 . 14 ) 

' ' L,M 

We now compute, as we did in Lemma 2.8, the derivatives of 

(12.15) 

using the classical rules of differential calculus; this gives for the first derivative 

£ & ■ (<j")7m/ m + £ /f ■ (s")7 M + £ lilhlS ■ ( 9 N y 

N N N 

and so on. We again represent this in graphical form in Fig. 12.1. 

/ 




Fig. 12.1. Derivatives of (12.15) 


We see that we arrive at trees u of order q such that s 3 (u) = s (where 3 = £>(s)) 
and the elements of d 3 {u) have no ramifications. The corresponding expressions 
are similar to (2.6;q-l) in Lemma 2.8. We finally have to insert the derivatives of 
g (see (12.11)) and rearrange the terms. Then, as in Fig. 2.4, the tall branches of 
d 3 (u) are replaced by trees z of order S, multiplied by a (z). Thus the coefficient 
which we obtain for a given tree t is just given by (12.8). 

The factor l/a(t) is due to the fact that in F?(ab, y) the term with ab(t)F(t) 
appears a{t) times. □ 
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Since hf(y ) = B(h,y) is a special B-series with b( . ) = 1 and all other 
b(t) H 0, we have the following 

Corollary 12.7. If a : T —> R with a(0) = 1, then 

hf(B( a ,y)) = B(a.',y ) 

with 

a'(0) = 0, a'(. ) = 1 

a '([*i> • • •, U) = a ( i i) • • • • • a (*m) (12.16) 

where t = [f l5 ..., £ m ] means that d^t) = {t 1: t 2 ,..., £ m } (Definition 2.12). 

Proof. We obtain (12.16) from (12.8) with i = 1, q = p(t) and the fact that the 
expression in brackets is independent of the labelling of t. □ 


Order Conditions for Runge-Kutta Methods 

As an application of Corollary 12.7, we demonstrate the derivation of order condi¬ 
tions for Runge-Kutta methods: we write method (2.3) as 

s s 

9i = y 0 + 'B, a ij k ji k = h f{9i ), Vi = yo + J2 b i k r (12.17) 

3 — 1 J=! 

If we assume g i , k i and to be B-series, whose coefficients we denote by 
g*> k i>yi 

9 i = B(g i ,y 0 ), ^ = £(^, 2 / 0 ), y 1 =B(y 1 ,y 0 ), 

then Corollary 12.7 immediately allows us to transcribe formulas (12.17) as 

g i (0) = l, k i (.) = l, yi (0) = l, 

S S 

Si(t ) = ki(t) = g(i) gi(*i) • • • • - gi(0» yi(*) = 

J-l J—1 

which leads easily to formulas (2.17), (2.19) and Theorem 2.11. 

Also, if we put y(h) = B( y, y Q ) for the true solution, and compare the deriva¬ 
tive hy'(h) of the series (12.7) with hf(y(h)) from Corollary 12.7, we immedi¬ 
ately obtain y (t) = 1 for all t, so that Theorem 2.6 drops out. The order conditions 
are then obtained as in Theorem 2.13 by comparing the coefficients of the B-series 
B(y,y 0 ) and B(y 1 ,y 0 ). 
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Butcher’s “Effective Order” 


We search for a 5-stage Runge-Kutta method a and for a method d, such that 
dad -1 represents a fifth order method u. This means that we have to satisfy 

da(t) = yd(£) for (12.18) 

where y(t) = 1 represents the B-series of the exact solution. Then 

(dad -1 )^ = da^d" 1 = (da^^ad" 1 ). (12.19) 

If now two Runge-Kutta methods b and c are constructed such that b = da and 
c = ad -1 up to order 5, then applying one step of b followed by k — 2 steps of a 
and a final step of c is equivalent (up to order 5) to k steps of the 5 th order method 
dad -1 (see Fig. 12.2). A possible set of coefficients, computed by Butcher (1969), 
is given in Table 12.1 (method a has classical order 4). 


a a a a 



u u u u u u 

Fig. 12.2. Effective increase of order 


Stetter’s approach. Soon after the appearance of Butcher’s purely algebraic proof, 
Stetter (1971) gave an elegant analytic explanation. Consider the principal global 
error term e p (x ) which satisfies the variational equation (8.8). The question is, 
under which conditions on the local error d p+1 ( x ) (see (8.8)) this equation can be 
solved, for special initial values, without effort. We write equation (8.8) as 

e'{x) - ^ (y(a:)) • e(x) = d(x) (12.20) 

and want e(x) to possess an expansion of the form 

e ( x ) = 5Z “(*) e W F ( t )(,y( x )) (12.21) 

t(zT p 

with constant coefficients e(t) . Simply inserting (12.21) into (12.20) yields 

d{x) = a(t) e(t) (A (F(t)(y(x))) - f'(y(x)) ■ F(t)(y(x)j). (12.22) 

t G Tp 

Thus, (12.21) is the exact solution of the variational equation, if the local error 
d(x) has the symmetric form (12.22). Then, if we replace the initial value y Q by 
the “starting procedure” 

Vo : = Vo ~ hPe ( x o) = Vo ~ hP 53 e ^) F ( t )(Vo) 

ter p 


(12.23) 
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Table 12.1. Butcher’s method of effective order 5 


0 

Method a 

1 

1 

5 

5 

2 

^ 2 



5 

5 

1 

3 0 5 

2 

16 16 


1 5 

1 

— 0 — — 2 


4 4 


1 2 1 


- 0 0 - - 


6 3 6 


0 



Method b 


0 



Method c 

1 

1 





1 

1 



5 

5 





5 

5 



2 

0 

2 




2 

0 

2 










5 

5 




5 

5 


3 

75 

9 

117 



3 

161 

19 

287 

4 

64 

4 

64 



4 

192 

12 

192 

1 

37 

7 

3 

4 


1 

27 

19 

291 36 

36 

3 

4 

9 


28 

7 

196 49 


19 

0 

25 

2 

1 


7 

0 

475 2 


144 

48 

9 

8 


48 

1008 7 


(or by a Runge-Kutta method equivalent to this up to order p +1; this would repre¬ 
sent “method d” in Fig. 12.2), its error satisfies y(x 0 ) — y 0 = h p e(x 0 ) + 0(h p+1 ). 
By Theorem 8.1 the numerical solution y n of the Runge-Kutta method applied to 
y Q satisfies y(x n ) — y n = h p e(x n ) + 0(h p+1 ) . Therefore the “finishing procedure” 

y„ ■=y n + hPe ( x n) = yn + hP J2 “W e W F ( t )(y n ) + o(h p+1 ) (12.24) 

t€zTp 

(or some equivalent Runge-Kutta method) gives a (p+ l)th order approximation 
to the solution. 

Example. Butcher’s method a of Table 12.1 has the local error 

dei*) = ~ 1^) - ^(O) + + ,^/'(?)). (12.25) 

The right-hand side of (12.22) would be (the derivation ^F attaches a new twig 
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to each of the nodes, the product f'(y) • F lifts the tree on a stilt) 
e(-v) (f(T) + 3 Fty) - F(T)) 

+ 3 e(\ > ) (F(v>) + F(V) + F(-y) + F( 4 ) - F(<t)) 

; . v \ (12 - 26) 

+e(Y) (F(-y) + F( T) + 2F(t) - F(^)) 

+e(^)(F(^) + F(t) + F(^) + F(^)-F(^)). 

Comparison of (12.25) and (12.26) shows that this method does indeed have the 
desired symmetry if 

e(vj/) = e(\>) = — -7 • —, e(Y) = e(^) =-^7 • 

v J vJ 6! 24’ vw 6! 8 

This allows one to construct a Runge-Kutta method as starting procedure corre¬ 
sponding to (12.23) up to the desired order. 

Exercises 

1. Show that the pairs of methods given in Tables 12.2 - 12.4 produce, at least for 
h sufficiently small, identical numerical results. 

Result, a) is seen by permutation of the stages, b) by neglecting superfluous 
stages (Dahlquist & Jeltsch 1979), c) by identifying equal stages (Stetter 1973, 
Hundsdorfer & Spijker 1981). See also the survey on “The Runge-Kutta space” 
by Butcher (1984). 

2. Extend formulas (12.6) by computing the composition ab(t) for all trees of 
order 4 and 5. 

3. Verify that the methods given in Table 12.1 satisfy the stated order properties. 

4. Prove, using Theorem 12.6, that the set 

G = {a : T —> R | a(0) = 1} 

together with the composition law of Definition 12.4 is a (non-commutative) 
group. 

5. (Equivalence of Butcher’s and Stetter’s approach). Let a : T —> R represent a 
Runge-Kutta method of classical order p and effective order p +1, i.e., a (t) = 
1 for g(t) <p and 

da(£) = yd(£) for g(t) < p + 1 (12.27) 

for some d : T —> R and with y (t) as in (12.18). Prove that then the local error 
h pJrl d(x) + 0(h p+2 ) of the method a has the symmetric form (12.22). This 
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Table 12.2. Equivalent methods a) 


0 


1 

0 

1 

1 

1 0 

0 

0 

0 


1/4 3/4 

3/4 

1/4 


Table 12.3. Equivalent methods b) 


1 

2 

0 

0 

-1 




3 

0 

1 

2 

0 




7 

0 

3 

4 

0 

1 

2 

-1 

2 

1 

0 

0 

1 

2 

1 

1 


1/2 

0 

0 

1/2 


1/2 

1/2 


Table 12.4. Equivalent methods c) 


1 

1 

1 

1 

-2 




1 

2 

2 

-1 

-2 




1 

-1 

-1 

5 

-2 

1 

3 

-2 

-1 

-1 

2 

1 

-3 

-1 

2 

-3 


1/4 

1/4 

1/4 

1/4 


3/4 

1/4 


means that, in this situation, Butcher’s effective order is equivalent to Stetter’s 
approach. 

Hint. Start by expanding condition (12.27) (using (12.8)) for the first trees. 
Possible simplifications are then best seen if the second sum Yli=o (f° r yd) is 
arranged downwards (i = q,q— 1,..., 0). One then arrives recursively at the 
result 

d (t) = d( . ) e ^ for g(t) < p — 1. 

Then express the error coefficients a (t) — 1 for g(t) =p + 1 in terms of d(s) — 
d( . )^( s ) where g(s) =p. Formula (12.22) then becomes visible. 


6. Prove that for t = [t 1: ..., t m ] the coefficient a(t) of Definition 2.5 satisfies 
the recurrence relation 


a{t) ■■ 


Q{i) 
e(t i),--- 


e(*m) 


aih) • • • • • a(t m ) 




(12.28) 


The integers /x 1? /i 2 , • • • count the equal trees among t l9 ..., t m . 

Hint. The multinomial coefficient in (12.28) counts the possible partitionnings 
of the labels 2,..., g(t) to the m subtrees t l5 ..., t m . Equal subtrees lead to 
equal labellings. Hence the division by /i 1 !/i 2 ! • • • • 






11.13 Higher Derivative Methods 


In Section 1.8 we studied the computation of higher derivatives of solutions of 

{y J )'= f J . .,y n ), J = 1,... ,n. (13.1) 

The chain rule 

(y J )" = %x y ^ + 1 i^~ y ')' y) + • • • + ( x > v) • f n ( x > v ) ( 13 - 2 ) 

leads to the differential operator 1} which, when applied to a function y), is 
given by 

(D^)(a;, y) = |^ (a:, y) + ^ (a, y) • y) + ... + (x, y) • f n {x, y). 

Ut/ (13.2’) 

Since Dy J = / J , we see by extending (13.2) that 

(y J )W = (D e y J )(x,y), t = 0,1,2,.... (13.3) 

This notation allows us to define a new class of methods which combine features 
of Runge-Kutta methods as well as Taylor series methods: 


Definition 13.1. Let a , (i, j = 1,..., s, r = 1,..., q) be real coefficients. 
Then the method 

k i e) = \ i. Dt v) (*o+ C A vo+J2Yl a i? k j r) ) 

r =1 7 = 1 

J 1 O A \ 


r=l j=l 

(V) 

is called an s-stage q-derivative Runge-Kutta method. If a\-’ = 0 for i < j, the 
method is explicit , otherwise implicit. 


A natural extension of (1.9) is here, because of Dx = 1, = 0 (£ > 2), 


(13.5) 
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Definition 13.1 is from Kastlunger & Wanner (1972), but special methods of this 
type have been considered earlier in the literature. In particular, the very successful 
methods of Fehlberg (1958, 1964) have this structure. 


Collocation Methods 


A natural way of obtaining s -stage q -derivative methods is to use the collocation 
idea with multiple nodes, i.e., to replace (7.15b) by 

u {e \x 0 + c i h) = (D e y)(x 0 + c i h,u(x 0 + c i h)) £=l,...,q i 

(13.6) 

where u(x) is a polynomial of degree q x + q 2 + • • • + q s and q x ,..., q s , the “mul¬ 
tiplicities” of the nodes c lf ..., c s , are given integers. For example q x = m, q 2 = 
... = g=l leads to Fehlberg-type methods. 

In order to generalize the results and ideas of Section II.7, we have to re¬ 
place the Lagrange interpolation of Theorem 7.7 by Hermite interpolation (Hermite 
1878: “Je me suis propose de trouver un polynome ...”). The reason is that (13.6) 
can be interpreted as an ordinary collocation condition with clusters of q i nodes 
“infinitely” close together (Rolle’s theorem). We write Hermite’s formula as 

s Qj 

j =1 r=1 


for polynomials p(t) of degree q- — 1. Here the “basis” polynomials ij r (t) of 
degree q- — 1 must satisfy 



if i=j 
else 


and k = r — 1 


(13.8) 


and are best obtained from Newton’s interpolation formula (with multiple nodes). 
We now use this formula, as we did in Section II.7, for p(t) = hu'(x 0 + th ) : 


with 


s Qj 

hu'(x o 4- th) = 

j =i r=1 


k^ = — -u^^XQ + Cjh). 


v\ 


If we insert 


u(x 0 + q/i) = y 0 + / hu'(x 0 + th) dt 

Jo 


(13.9) 


(13.10) 


together with (13.9) into (13.6), we get: 
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Theorem 13.2. The collocation method (13.6) is equivalent to an s-stage q- 
derivative implicit Runge-Kutta method (13.4) with 


(r) 

<4 = 




(13.11) 

□ 


Theorems 7.8, 7.9, and 7.10 now generalize immediately to the case of “con¬ 
fluent” quadrature formulas; i.e., the q -derivative Runge-Kutta method possesses 
the same order as the underlying quadrature formula 

f p(t) dt « &j r) p (r-1) ( Cj ). 

^o j_i r= \ 

The “algebraic” proof of this result (extending Exercise 7 of Section II.7) is more 
complicated and is given, for the case q- —q, in Kastlunger & Wanner (1972b). 
The formulas corresponding to condition C(rj) are given by 


s Qj 

EE4 1 

j =1 r=l 


p—r p 

S = c f> 


3 = 1 


(13.12) 


r r \ 

These equations uniquely determine the a - , once the c i have been chosen, by 
a linear system with a “confluent” Vandermonde matrix (see e.g., Gautschi 1962). 
Formula (13.12) is obtained by setting p(t) = t Q ~ x in (13.7) and then integrating 
from 0 to c i . 

Examples of methods. “Gaussian” quadrature formulas with multiple nodes exist 
for odd q (Stroud & Stancu 1965) and extend to q -derivative implicit Runge-Kutta 
methods (Kastlunger & Wanner 1972b): for s = 1 we have, of course, c 1 — 1/2 
which yields 

b^ k) = 0, b ( P +1) = 2~ 2k , a'li = (—l) fc+1 2- fc . 

We give also the coefficients for the case 5 = 2 and qi=q 2 =3- The nodes c i and 

(k) 

the weights b\ ’ are those of Stroud & Stancu. The method has order 8: 


c 1 =0.185394435825045 
b (k) = 0.5 

b { p/2\ = 0.0240729420844974 
^ 3) /3! = 0.00366264960671727 


6 (1) - 6 (1) 

0 2 — U 1 

b^ = -bf ] 
b { 2 ] = 
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(i) _ ( 0.201854115831005 -0.0164596800059598 \ 
a ‘'i ~ ( 0.516459680005959 0.298145884168994 ) 


f -0.0223466569080541 0.00868878773082417 \ 

( 0.0568346718998190 -0.0704925410770490 ) 


f 0.0116739668400997 -0.00215351251065784 \ 
( 0.0241294101509615 0.0103019308002039 J 


Hermite-Obreschkoff Methods 

We now consider the special case of collocation methods with 5 = 2, c 1 =0, c 2 = 1. 
These methods can be obtained in closed form by repeated partial integration as 
follows (Darboux 1876, Hermite 1878): 

Lemma 13.3. Let m be a given positive integer and P(t ) a polynomial of exact 
degree m. Then 

m m 

Y,h i (D*y)(x 1 ,y 1 )p( m -S\0)&^h*{iyy)(x Q ,y Q )p( m -»{l) (13.13) 

3 =0 3=0 

defines a multiderivative method (13.4) of order m. 

Proof We let y(x ) be the exact solution and start from 

h m+ 1 f y( m+1 \x 0 + ht)P(l-t)dt = O(h m+1 ). 

Jo 

This integral is now transformed by repeated partial integration until all derivatives 
of the polynomial P(l — t) are used up. This leads to 

m m 

= J2 hj y U) ( x o) p(m ~ j) 0)+o{h m+1 ). 

3=0 3=0 

If this is subtracted from (13.13) we find the difference of the left-hand sides to be 
0(h m+1 ), which shows by the implicit function theorem that (13.13) determines 
y l to this order if pO), which is a constant, is 7 ^ 0 . □ 


The argument 1 — t in P (instead of the more natural t ) avoids the sign 
changes in the partial integrations. 

A good choice for P(t) is, of course, a polynomial for which most derivatives 
disappear at t = 0 and t = 1. Then the method (13.13) is, by keeping the same 
order m, most economical. We write 


Pit) 


t k {t-iy 
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and obtain 

£ h £(£ — 1) h 2 2 w x , 

8,1 ~ Jk + t)T\ + (k + Q(fc + f- 1) ¥ (D 

k h k(k — 1) h 2 2 . 

= + (tTf) II (D!,)(X »- ^ + (H-<) (t + <-l) ¥ (D » )(x »'»»> ■+ ■' ■ 

(13.14) 

which is a method of order m = k~h£. After the £th term in the first line and the 
k th term in the second line, the coefficients automatically become zero. Special 


cases of this method are: 



k = 1, 

£ = 0 : 

explicit Euler 

k > 1 ? 

£ = 0: 

Taylor series 

k = 0, 

£ = 1: 

implicit Euler 

k = 1, 

£=1 : 

trapezoidal rule. 


Darboux and Hermite advocated the use of this formula for the approximations 
of functions, Obreschkoff (1940) for the computation of integrals, Loscalzo & 
Schoenberg (1967), Loscalzo (1969) as well as Nprsett (1974a) for the solution 
of differential equations. 


Fehlberg Methods 

Another class of multiderivative methods is due to Fehlberg (1958, 1964): the idea 
is to subtract from the solution of y f = f(x,y), y(x 0 ) = y 0 m terms of the Taylor 
series (see Section 1.8) 

m 

y{x)--y(x)-^2Y i {x-x 0 )\ (13.15) 

i=0 

and to solve the resulting differential equation y'(x) = f(x, y(x)) , where 

m m 

f(x,y(x)) = f [x,y + ^Y i {x - x 0 )®) -^Y i i{x - Xq) 1-1 , (13.16) 

i=0 i=l 

by a Runge-Kutta method. Thus, knowing that the solution of (13.16) and its first 
m derivatives are zero at the initial value, we can achieve much higher orders. 

In order to understand this, we develop the Taylor series of the solution for the 
non-autonomous case, as we did at the beginning of Section II.l. We thereby omit 
the hats and suppose the transformation (13.15) already carried out. We then have 
from (1.6) (see also Exercise 3 of Section II.2) 

/ = 0 , 

fx + fyf = 0 ) 

fxx + Zfxyf + fyyf 2 + fyifx + fyf ) = O’ CtC. 
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These formulas recursively imply that f — 0, f x — 0 ,..., 3 m-1 //ch m_1 = 0. 
All elementary differentials of order < m and most of those of higher orders then 
become zero and the corresponding order conditions can be omitted. The first non¬ 
zero terms are 


d m f 

d x m 

d m+lf 

dx m+1 


and 


for order 

Of d™f 

dy dx 171 


m + 1, 

for order 


m + 2, 


and so on. The corresponding order conditions are then 


for order m + 1, 


i—l 


1 

m + 1 


i= 1 


= 


and 


m - 




a ij C j 




(m + l)(m + 2) 


for order m + 2, and so on. 

The condition J2 a ij = c i, which usually allows several terms of (1.6) to be 
grouped together, is not necessary, because all these other terms are zero. 

A complete insight is obtained by considering the method as being partitioned 
applied to the partitioned system y' = /(x, t/), x' = 1. This will be explained in 
Section 11.15 (see Fig. 15.3). 


Example 13.4. A solution with 5 = 3 stages of the (seven) conditions for order 
m + 3 is given by Fehlberg (1964). The choice c x = c 3 = 1 minimizes the numer¬ 
ical work for the evaluation of (13.16) and the other coefficients are then uniquely 
determined (see Table 13.1). 

Fehlberg (1964) also derived an embedded method with two additional stages 
of orders m + 3 (m + 4). These methods were widely used in the sixties for sci¬ 
entific computations. 



Table 13.1. Fehlberg, order m + 3 

1 



0 

0 m 

m + 3 


1 

1 

2 

m+1 

0 m + l)0 m 


0 

m + 3 1 


2(m+ l)(m + 2)0 m 2 (m + 2) 
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General Theory of Order Conditions 


For the same reason as in Section II.2 we assume that (13.1) is autonomous. The 
general form of the order conditions for method (13.4) was derived in the thesis of 
Kastlunger (see Kastlunger & Wanner 1972). It later became a simple application 
of the composition theorem for B-series (Hairer & Wanner 1974). The point is that 
from Theorem 2.6, 


\-{D l y){y 0 )= ^ F WW= B (y W .%) 

tELT,Q(t)=i 

is a B-series with coefficients 

y w w = { 1 if *?(*)=* 

10 otherwise. 

Thus, in extension of Corollary 12.7, we have 


h 


'-(D l y)(B( ai y 0 ))=B(^\y 0 ) 


where, from formula (12.8) with q = g{t ), 

~—( q 
a(t) \i 


a w W = (ay w )W^(J] X) II a (*)> 

W W zeMt) 


(13.17) 


(13.18) 


(13.19) 


(13.20) 


and the sum is over all a(t) different labellings of t. This allows us to compute 
recursively the coefficients of the B-series which appear in (13.4). 


Example 13.5. The tree t = \> sketched in Fig. 13.1 possesses three different 
labellings, two of which produce the same difference set d 2 {t ), so that formula 
(13.20) becomes 


a"(«»=2(2(a(.)) 2 +a(/)). (13.21) 



Fig. 13.1. Different labellings of v> 


For all other trees of order <4 we have off) = 1 and (13.20) leads to the 
following table of second derivatives 


a"( . ) = 0 
a"(v) = 3a( . ) 
a"(Nr) = 6(a( . )) 2 

a"(Y) = 6(a( . )) 2 


a"(/) = l 

a"(» = 3a(.) 

a"(v») = 4(a( . )) 2 + 2a(/) 

a" (/) = 6a(/). 


(13.22) 
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Once these expressions have been established, we write formulas (13.4) in the form 

k i% = ( D e y)(gi ) 

9i = yo + J2J2 a i? k j r) ' yi = yo + J2J2 b j' )k j' ) (13 - 23) 

r= 1 j =1 r= 1j=l 

and suppose the expressions fc) , #•, y 1 to be B- series 

= B ( k i e \vo)’ 9 i = B(g i ,y 0 ), y 1 =B(y 1 ,y 0 ). 

Then equations (13.23) can be translated into 

] 4 1 \t) = Q(t)g i (t 1 )-...-g i (t m ), kf ) (r) = l (see (12.16)) 

k- 2) (t) = g "(f) from (13.22) 

k- 3 ^ (t) = g r "{t) from Exercise 1 or Exercise 2, etc. 

w = E 53 4 r) k ! r) (*) ’ y i w =53 53 b V k 5 r) (*) ■■ 

r= 1 j = l r=l j=l 

These formulas recursively determine all the coefficients. Method (13.4) (together 
with (13.5)) is then of order p if, as usual, 

y 1 (t) = 1 for ah t with g(t) < p. (13.24) 

More details and special methods are given in Kastlunger & Wanner (1972); see 
also Exercise 3. 


Exercises 

1. Extend Example 13.5 and obtain formulas for a( 3 ) (t) for ah trees of order < 4. 

2. (Kastlunger). Prove the following variant form of formula (13.20) which ex¬ 
tends (12.16) more directly and can also be used to obtain the formulas of 
Example 13.5. If t = [t 1? ..., t m \ then 

a (i) W = ^ E a^fj.-.a 

A i +...+A m =i— 1 
Ai,...,A m >0 

Hint. See Kastlunger & Wanner (1972); Hairer & Wanner (1973), Section 5. 
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3. Show that the conditions for order 3 of method (13.4) are given by 

E'-f’-i 

2X>"VX>! 2) = 1 

i,j i i i 

where c< = E 2 ajf. e< = Ej 4? • 

4. (Zurmiihl 1952, Albrecht 1955). Differentiate a given first order system of 
differential equations y' = f(x,y) to obtain 

y" = (D 2 y)(x,y), y(x 0 )=y 0 , y'(x 0 ) = f 0 . 

Apply to this equation a special method for higher order systems (see the fol¬ 
lowing Section 11.14) to obtain higher-derivative methods. Show that the fol¬ 
lowing method is of order six 

h = h 2 g(x 0 ,y 0 ) 

i ii/ h h , 1 , . 

k 2 ~h g{x 0 + -,y 0 + -f 0 + — k^ 

k 3 = h 2 g(x 0 + — ,y 0 + - f 0 + — (-fe x + 4£: 2 )) 

K = h 2 g(x 0 + y 0 + y / 0 + (3^ + 4fc 2 + 2fc 3 )) 

Vi = 2/o + hfo + qq (7fc x + 24fc 2 + 6fc 3 + 8fc 4 ) 
where g(x, y) = ( D 2 y)(x , y) = Df(x, y) = f x (x, y) + f y (x, y) ■ f(x, y). 



11.14 Numerical Methods 

for Second Order Differential Equations 

Mutationem motus proportionalem esse vi motrici impressae 

(Newton’s Lex II, 1687) 


Many differential equations which appear in practice are systems of the second 
order 


y" = f(x,y,y'). 


(14.1) 


This is mainly due to the fact that the forces are proportional to acceleration, i.e., to 
second derivatives. As mentioned in Section 1.1, such a system can be transformed 
into a first order differential equation of doubled dimension by considering the 
vector (■ y,y f ) as the new variable: 


(y\ = ( y ' ) v( x o)=vo 

\y' ) \f(x,y,y')) y'(x 0 ) = y' 0 . 


(14.2) 


In order to solve (14.1) numerically, one can for instance apply a Runge-Kutta 
method (explicit or implicit) to (14.2). This yields 


K = y'o+ h Yl a ij k 'j 

3 = 1 

s s 

K = f(x 0 + c t h, y 0 + hJ2 dijkj, y' 0 + hJ2 a ij k j ) (14-3) 

3 = 1 3 =1 

s s 

yi=y 0 + hJ2 b i k i’ y'i = y'o + h J2 b i k i- 

i —1 i —1 

If we insert the first formula of (14.3) into the others we obtain (assuming (1.9) and 
an order > 1) 


K = f(x 0 + C t h, y 0 + Ci hy' 0 + h 2 ^ a ij k j > Vo + h Yl a ij k J ) 


3 = 1 


3 = 1 


(14.4) 


Vi = y 0 + hyo + h 2 J2 b i k i’ yi = y'o + h J2 b iK 




i=l 


where 


J ij = J2 a ik a kj’ 
k=l 


5, = E‘ 

3 = 1 


j a j» ■ 


(14.5) 
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For an implementation the representation (14.4) is preferable to (14.3), since about 
half of the storage can be saved. This may be important, in particular if the dimen¬ 
sion of equation (14.1) is large. 


Nystrom Methods 


R.H. Merson: “... I have not seen the paper by Nystrom. Was it 
in English?” 

J.M. Bennett: “In German actually, not Finnish.” 

(From the discussion following a talk of Merson 1957) 

E.J. Nystrom (1925) was the first to consider methods of the form (14.4) in which 
the coefficients do not necessarily satisfy (14.5) (“Da bis jetzt die direkte Anwen- 
dung der Rungeschen Methode auf den wichtigen Fall von Differentialgleichungen 
zweiter Ordnung nicht behandelt war ...” Nystrom, 1925). Such direct methods 
are called Nystrom methods. 

Definition 14.1. A Nystrom method (14.4) has order p if for sufficiently smooth 
problems (14.1) 

y(x 0 + h) -y 1 =0(h p+1 ), y'(x 0 + h)-y[=O(h p+1 ). (14.6) 

An example of an explicit Nystrom method where condition (14.5) is violated 
is given in Table 14.1. Nystrom claimed that this method would be simpler to apply 
than “Runge-Kutta’s” and reduce the work by about 25%. This is, of course, not 
true if the Runge-Kutta method is applied as in (14.4) (see also Exercise 2). 


Table 14.1. Nystrom, order 4 


0 








1 

1 



1 




2 

8 


a ij 

2 


a ij 


1 

1 

0 


0 

1 



Ci - 






1 2 

8 

1 

2 

2 



1 

0 

0 

0 

0 

1 


_ 

1 

1 

1 

1 

2 

2 

1 <_ b 




0 






6 

6 

6 

6 

6 

6 

6 i 


A real improvement can be achieved in the case where the right-hand side of 
(14.1) does not depend on y', i.e., 


y" = 


(14.7) 
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Here the Nystrom method becomes 

s 

K = f( x o + c i h » Vo + c i h y'o + h2 ^2 « ijkj ) 


3 = 1 


(14.8) 


Vi = y 0 + h v'o + h2 Yl b i k i > yi = y'o + h Yl h > k '>- 


Z=1 


Z=1 


and the coefficients a - are no longer needed. Some examples are given in Table 
14.2. The fifth-order method of Table 14.2 needs only four evaluations of /. This 
is a considerable improvement compared to Runge-Kutta methods where at least 
six evaluations are necessary (cf. Theorem 5.1). 


Table 14.2. Methods for y" = f(x,y) 


Nystrom, order 4 


0 

a ij 

1 

1 

1 2 

8 


1 

1 

0 


2 


1 1 

h 

- - 0 


6 3 

u. 

1 4 1 

Oi 

6 6 6 



Nystrom, order 5 


0 





1 

1 




5 

50 


a ij 


2 

-1 

7 



3 

27 

27 



1 

3 

-2 

9 


10 

35" 

35 


T. 

14 

100 

54 

0 

Vl 

336 

336 

336 

h- 

14 

125 

162 

35 

Oi 

336 

336 

336 

336 


Global convergence. Introducing the variable z n = (t/ n , y r n ) T , a Nystrom method 
(14.4) can be written in the form 


z 1 = z 0 + h$(x 0 ,z 0 ,h) 


where 


<f>(x Q ,z 0 ,h) = 


y'o + h E t KK 
Ei h iK 


(14.9) 


(14.9) is just a special one-step method for the differential equation (14.2). For 
a pth order Nystrom method the local error (y(x 0 + h) — y'(x 0 + h) — y[) T 
can be bounded by Ch p+1 (Definition 14.1), which is in agreement with formula 
(3.27). The convergence theorems of Section II.3 and the results on asymptotic 
expansions of the global error (Section II.8) are also valid here. 


Our next aim is to derive the order conditions for Nystrom methods. For this 
purpose we extend the theory of Section II.2 to second order differential equations 
(Hairer & Wanner 1976). 
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The Derivatives of the Exact Solution 

As for first order equations we may restrict ourselves to systems of autonomous 
differential equations 

{y J Y = f J {y\...,y n ,y'\... ) y' n ) (14.10) 

(if necessary, add x" = 0). The superscript index J denotes the Jth component of 
the corresponding vector. We now calculate the derivatives of the exact solution of 
(14.10). The second derivative is given by (14.10): 

{y J ) (2) = f\y,y')- (i4.ii;2) 

A repeated differentiation of this equation, using (14.10), leads to 

(/) <3) = 2^(#.!/')• y' K + J2 (i4.ii;3) 

(» J ) <4> = E (».»')■!/*■ v' L 04.1U4) 

^ 02 fJ flfj 

+Z y') • y' K • f L (y> y')+£ y') / K (y’ y') 

+ jE y') f K (y’ v') • y ,L 

+ II Q^Jgy,^ y ') y') y') 

„df K t , wi 

+ 2^^F(y>y)^r(y>y)y 

OfJ pjfK 

+ Z ^*(y> y') ^r(y> y') / L (y> y') 

The continuation of this process becomes even more complex than for first order 
differential equations. A graphical representation of the above formulas will there¬ 
fore be very helpful. In order to distinguish the derivatives with respect to y and 
y' we need two kinds of vertices: “meagre” and “fat”. Fig. 14.1 shows the graphs 
that correspond to the above formulas. 

Definition 14.2. A labelled N-tree of order q is a labelled tree (see Definition 2.2) 

t '■ A q\{j}^ A q 

together with a mapping 


t' : A q ^ {“meagre”, “fat”} 
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(14.11;2) 
(14.11;3) 

(14.11;4) 

(14.11;5) 


Fig. 14.1. The derivatives of the exact solution 


which satisfies: 

a) the root of t is always fat; i.e., t'(j ) = “fat”; 

b) a meagre vertex has at most one son and this son has to be fat. 

We denote by LNT q the set of all labelled N-trees of order q . 

The reason for condition (b is that all derivatives of g(y , y') = y' vanish iden¬ 
tically with the exception of the first derivative with respect to y '. 

In the sequel we use the notation end-vertex for a vertex which has no son. If 
no confusion is possible, we write t instead of {t , t') for a labelled N-tree. 

Definition 14.3. For a labelled N-tree t we denote by 

F J (t)(y,y') 

the expression which is a sum over the indices of all fat vertices of t (without u j ", 
the index of the root) and over the indices of all meagre end-vertices. The general 
term of this sum is a product of expressions 

QyL alld V ' K ‘ (l4 ' l2) 

A factor of the first type appears if the fat vertex k is connected via a meagre son 
with /,... and directly with a fat son m,...; a factor y ,K appears if “fc” is the 
index of a meagre end-vertex. The vector F{t) (t/, y f ) is again called an elementary 
differential. 

For some examples see Table 14.3 below. Observe that the indices of the mea¬ 
gre vertices, which are not end-vertices, play no role in the above definition. In 
analogy to Definition 2.4 we have 

Definition 14.4. Two labelled N-trees (t , t') and (u,u') are equivalent, if they 
differ only by a permutation of their indices; i.e., if they have the same order, say 
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q , and if there exists a bijection <7 : A q ^ A q with a(j)=j, such that ta = au on 
A q \ {j } and t'a = u '. 

For example, the second and fourth labelled N-trees of formula (14.11 ;4) in 
Fig. 14.1 are equivalent; and also the second and fifth of formula (14.11 ;5). 

Definition 14.5. An equivalence class of q th order labelled N-trees is called an 
N-tree of order q. The set of all N-trees of order q is denoted by NT q . We further 
denote by a(t) the number of elements in the equivalence class t, i.e., the number 
of possible different monotonic labellings of t . 

Representatives of N-trees up to order 5 are shown in Table 14.3. We are now 
able to give a closed formula for the derivatives of the exact solution of (14.10). 

Theorem 14.6. The exact solution of (14.10) satisfies 

y(q) = ^2 F(t)(y,y')= ^2 a(t)F(t)(y,y'). (14.11;q) 

teLNTq _1 teNTq _1 

Proof The general formula is obtained by continuing the computation for (14.11 ;2- 
4) as in Section II.2. □ 


The Derivatives of the Numerical Solution 

We first rewrite (14.4) as 

s s 

9i = Vo + c i h v'o + % h2 f ( 9 j, 9j), g'i = y'o + J2 a ij h f(9j,gj) 

3 =1 3 = 1 

s s 

y 1 =y 0 + hy , 0 + '22b i h 2 f(g i ,g' i ), y[ = y' 0 + ^ b i hf{g i , g[) 

i= 1 i=l 

(14.13) 

so that the intermediate values g^g[ are treated in the same way as y 1 ^y[. In 
(14.13) there appear expressions of the form h 2 ip(h) and hip(h ). Therefore we 
have to use in addition to (2.4) the formula 

(h 2 ip(h)) {q) \ h=0 mq-(q-l)-(f{h)) i9 ^ 2) \ h=0 . (14.14) 

We now compute successively the derivatives of gf and g[ J at h = 0: 

= Y. a vi J 


j 


yo,Vo 


(14.15; 1) 
(14.16; 1) 
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(9 


4 ')( 2 )| 


<1=0 ~ 2 ^2 a ijf J 


'yo,y 0 


(14.15;2) 


For a further differentiation we need 

df 

dy 


(/ J (5i,ffi)) (1) = $3 ) (1) + E (14.17) 


K 


K 


With this formula we then obtain 

(A ,J ) (2) L =0 = 2 E a ri c iE 


K 


dfj 
dy 


K 


•y 


fK\ 


1 yo,y 0 


+ 2 E% a ifeE df fKl 


j,k 


K 


dy' K 


yo,y 0 


(9i) {3) \ h=0 = 3 • 2 ^ a ijCj J2^K~y /K 


K 


yo,y 0 


‘ 2 E E 

j,k 


K 


df J 

dy rK 


pK I 


1 2/0,2/0 


(14.16;2) 


(14.15;3) 


To write down a general formula we need 

Definition 14.7. For a labelled N-tree we denote by $ • (f) the expression which is 
a sum over the indices of all fat vertices of t (without “j", the index of the root). 
The general term of the sum is a product of 
a kl if the fat vertex “k" has a fat son “Z"; 

a kl if the fat vertex “k" is connected via a meagre son with “Z"; and 
c™ if the fat vertex “fc" is connected with m meagre end-vertices. 

Theorem 14.8. 77ze #•,#' of (14.13) satisfy 

s 

( 9i) {q+1) \ h 0 = (<7+l) E 7(0 E 2/o) (14.15;q+l) 

tELNT q j=l 
s 

(9i) {q) \ h=0 = E 7(OE°ri' $ iW F ^)(%>J/o) (14.16;q) 

teLNT q j =1 

where 7 (t) w g/ven m Definition 2.10. 

Proof. For small values of g these formulas were obtained above; for general values 
of q they are proved like Theorem 2.11. System (14.2) is a special case of what 
will later be treated as a partitioned system (see Section 11.15). Theorem 14.8 will 
then appear again in a new light. □ 



290 II. Runge-Kutta and Extrapolation Methods 


Because of the similarity of the formulas for g • and y 1 , g • and y[ we have 
Theorem 14.9. The numerical solution y 1 ,y[ of (14.13) satisfies 

s 

(yi) {g) \h=o = 9 52 , y( t )'52K%(t) F (t)(y Q ,y , 0 ) (14.18;q) 

t£zLNTq — \ i=l 

(y , i) iq ~ 1) \h=o = 52 r v( t )'72 b i®i( t ) ^(Vo^Vo) • (14.19;q-l) 

tELNT q -i i =1 r-i 


The Order Conditions 


For the study of the order of a Nystrom method (Definition 14.1) one has to com¬ 
pare the Taylor series of y 1 , y[ with that of the true solution y(x 0 + h),y f (x 0 + h). 

Theorem 14.10. A Nystrom method (14.4) is of order p iff 

s - 1 

52 h i (*) = (g(f) + 1 ) . 7 (f) f° r N ~ trees * with ett ) - P - 1 ’ (14.20) 

^ bfifit) = —— /or N-trees t with g(t) < p . (14.21) 

i=l TW 

i/ere £>(t) denotes the order of the N-tree t, <!>•(£) and y(i) are given by Definition 
14.7 and formula (2.17). 


Proof. The “if” part is an immediate consequence of Theorems 14.6 and 14.9. 
The “only if” part can be shown in the same way as for first order equations (cf. 
Exercise 4 of Section II.2). □ 


Let us briefly discuss whether the extra freedom in the choice of the parameters 
of (14.4) (by discarding the assumption (14.5)) can lead to a considerable improve¬ 
ment. Since the order conditions for Runge-Kutta methods (Theorem 2.13) are 
a subset of (14.21) (see Exercise 3 below), it is impossible to gain order with this 
extra freedom. Only some (never all) error coefficients can be made smaller. There¬ 
fore we shall turn to Nystrom methods (14.8) for special second order differential 
equations (14.7). 

For the study of the order conditions for (14.8) we write (14.7) in autonomous 
form 

V" = HV )• ( 14 . 22 ) 
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This special form implies that those elementary differentials which contain deriva¬ 
tives with respect to y' vanish identically. Consequently, only the following subset 
of N-trees has to be considered: 


Definition 14.11. An N-tree t is called a special N-tree or SN-tree, if the fat vertices 
have only meagre sons. 


Theorem 14.12. A Nystrom method (14.8) for the special differential equation 
(14.7) is of order p, iff 

s ^ ^ 

72 b i®i(t) = / 7 T 7 —TT — 7 TT for SN-trees t with g(t) <p- 1, (14.23) 

8 ^ 1 

= —— for SN-trees t with g(t) < p . (14.24) 


All SN-trees up to order 5, together with the elementary differentials and the 
expressions 5 p, a, and 7 , which are needed for the order conditions, are given 
in Table 14.3. 

Higher order systems. The extension of the ideas of this section to higher order 
systems 

y^ = f(x,y,y',...,y( n -V) (14.25) 

is now more or less straightforward. Again, a real improvement is only possible in 
the case when the right-hand side of (14.25) depends only on x and y. A famous 
paper on this subject is the work of Zurmiihl (1948). Tables of order conditions and 
methods are given in Hebsacker (1982). 


On the Construction of Nystrom Methods 


The following simplifying assumptions are useful for the construction of Nystrom 
methods. 

Lemma 14.13. Under the assumption 

= ~ Ci ) i = l,...,s (14.26) 

the condition (14.24) implies (14.23). 

Proof. Let t be an SN-tree of order < p — 1 and denote by u the SN-tree of order 
g(t) + 1 obtained from t by attaching a new branch with a meagre vertex to the 
root of t. By Definition 14.7 we have 3>-(it) = c•<!>•(£) and from formula (2.17) it 
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Table 14.3. SN-trees, elementary differentials and order conditions 


t 

graph 

Q{t) 

a(t) 

7(0 

F J (t)(y,y') 

*i(f) 

tl 

°j 

1 

1 

1 

f J 

1 

t2 

\ 

2 

1 

2 

T,Kf J KV' K 

c i 

I 3 

V 

3 

1 

3 

J2k,l fKLy ,K y ,L 

4 

u 

‘E 

3 

1 

6 

E L fif L 


tb 

, / m 

4 

1 

4 

Y.K.L.Mf J KLMy' K y' L y' M 

4 

tQ 

v7' 

4 

3 

8 

E L,Mf J LMV ,L f M 

Em c j a jm 

t7 


4 

1 

24 

E L Mf J LfMV /M 

Ei ®jl c l 


7 l m 

k 'W p P 

5 

1 

5 

EK,L,M,pf J KLMpy /K y /L y ,M y' p 

4 

f 9 

m p U J 

5 

6 

10 

E L ,M, P f J LMPy' L y ,M f p 

E P c 7iP 

*10 

Vj V, 

5 

3 

20 

r J nM nP 

JMPJ J 

Em j) a jm a jp 

*11 


5 

4 

30 

fJ fL jM /P 

2 ^l,m,p 1 Lpj My y 

Ei c j a jl c l 

^12 


5 

1 

60 

T.L,M,pflfMpy' M y' p 

Ei 

^13 

k \s 

yj 

5 

1 

120 

El,p flfpf 

Ei, P a ji a ip 


follows that 7 (u) = (g(t) + 1)7 (t)/g(t). The conclusion now follows since 


E^) = Emew-E^( u ) 

i— 1 i=1 i—i 


1 1 

7 (t) 7 (n) 


1 

teW + ibW 

□ 


Lemma 14.14. Let t and u be two SN-trees as sketched in Fig. 14.2, where the 
encircled parts are assumed to be identical. Then under the assumption 

s 2 

E%' = ^" *=l,...,s (14.27) 

3=1 

the order conditions for t and u are the same. 

Proof. It follows from Definition 14.7 and (14.27) that 4 p(£) = & i (u)/2 and from 
formula (2.17) that 7 (t) = 27 (u). Both order conditions are thus identical. □ 
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t = 




Fig. 14.2. Trees of Lemma 14.14 

Condition (14.26) allows us to neglect the equations (14.23), while condition 
(14.27) plays a similar role to that of (1.9) for Runge-Kutta methods. It expresses 
the fact that the g i of (14.13) approximate y(x 0 + c-/i) up to 0(h 3 ). As a con¬ 
sequence of Lemma 14.14, SN-trees which have at least one fat end-vertex can be 
left out (i.e., t 4 , t 6 , t 9 , £ 10 , t 13 of Table 14.3). 

With the help of (14.26) and (14.27) explicit Nystrom methods (14.8) of or¬ 
der 5 with 5 = 4 can now easily be constructed: the order conditions for the 
trees t 1 ,t 2 ,t 3: t 5 and t 8 just indicate that the quadrature formula with nodes c 1 = 
0, c 2 , c 3 , c 4 and weights b 1: b 2l b 3 , b 4 is of order 5. Thus the nodes c- have to 
satisfy the orthogonality relation 

/ x{x — c 2 ){x — c 3 ){x — c 4 ) dx = 0 

Jo 

and we see that two degrees of freedom are still left in the choice of the quadrature 
formula. The a are now uniquely determined and can be computed as follows: 
a 21 is given by (14.27) for i = 2. The order conditions for t 7 and £ n constitute 
two linear equations for the unknowns 

2 3 

jCj and ' 

3 = 1 3 =1 


Together with (14.27, i = 3) one now obtains a 31 and a 32 . Finally, the order 
condition for t 12 leads to an d the remaining coefficients a 41 ,a 42 ,a 43 

can be computed from a Vandermonde-type linear system. The method of Table 
14.2 is obtained in this way. 

For still higher order methods it is helpful to use further simplifying assump¬ 
tions; for example 


i>,4= 


3 = 1 


^+ 2 


((/ + 2)((/+l) 


(14.28) 


which, for q 


= 0, reduces to (14.27), and 


E 6 i 

i =1 


C " a iJ b A {q + 2 )( q +l) 




(14.29) 


which can be considered a generalization of condition D(Q of Section II.7. For 
more details we refer to Hairer & Wanner (1976) and also to Albrecht (1955), 
Battin (1976), Beentjes & Gerritsen (1976), Hairer (1977, 1982), where Nystrom 
methods of higher order are presented. 
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Embedded Nystrom methods. For an efficient implementation we need a step 
size control mechanism. This can be performed in the same manner as for Runge- 
Kutta methods (see Section II.4). One can either apply Richardson extrapolation in 
order to estimate the local error, or construct embedded Nystrom methods. 

A series of embedded Nystrom methods has been constructed by Fehlberg 
(1972). These methods use a (p+ 1) -st order approximation to y(x 0 -\-h) for 
step size control. A (p+ 1) -st order approximation to y'(x 0 + h) is not needed, 
since the lower order approximations are used for step continuation. 

As for first order differential equations, local extrapolation — to use the higher 
order approximations for step continuation — turns out to be superior. Bettis 
(1973) was apparently the first to use this technique. His proposed method is of 
order 5(4). A method of order 7(6) has been constructed by Dormand & Prince 
(1978), methods of order 8(7), 9(8), 10(9) and 11(10) are given by Filippi & Graf 
(1986) and further methods of order 8(6) and 12(10) are presented by Dormand, 
El-Mikkawy & Prince (1987). 

In certain situations (see Section II.6) it is important that a Nystrom method 
be equipped with a dense output formula. Such procedures are given by Dormand 
& Prince (1987) and, for general initial value problems y" = f(x,y, y '), by Fine 
(1987). 


An Extrapolation Method for y n = f(x , y ) 

Les calculs originaux, comprenant environt 3.000 pages in-folio 
avec 358 grandes planches, et encore 3.800 pages de developpe- 
ments mathematiques correspondants, appartiennent maintenant 
a la collection de manuscrits de la Bibliotheque de l’Universite, 
Christiania. (Stormer 1921) 


If we rewrite the differential equation (14.7) as a first order system 



we can apply the GBS-algorithm (9.13) directly to (14.30); this yields 

Vi = Vo + %o 
y[ =y'o + hf(x 0 ,y 0 ) 
y i+1 =y i - 1 + 2hy[ 

y'i +1 = y'i -1 + 2hf{x i , y t ) i = 1 , 2 ,. .., 2n 

S h (x) = (y 2n _ i + 2 y 2n + y 2n+1 )/4 

S h ( X ) = (vL-l + 2 V2n + 2/2n+l)/ 4 - 


(14.30) 

(14.31a) 

(14.31b) 

(14.31c) 


Here, S h {x) and S' h (x) are the numerical approximations to y(x) and y f (x) at 
x = x 0 4- 77, where H = 2 nh and x { = x 0 + ih. We now make the following im¬ 
portant observation: for the computation of 2/ 2 ? 2/4 9 • • • > V 2 n ( even indices) and 
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of y[ , 2/3, • • •, y' 2n +i (odd indices) only the function values f(x Q , y 0 ), f(x 2 , y 2 ) 
,..., f{pc 2n , y 2n ) have to be calculated. Furthermore, we know from (9.17) that 
y 2n and (y 2n _ 1 + y f 2 n + 1)/2 each possess an asymptotic expansion in even powers 
of h. It is therefore obvious that (14.31c) should be replaced by (Gragg 1965) 


$h( X ) ~ (2/2n-l + 2/2n+l)/2- 

Using this final step, the number of function evaluations is reduced by a factor of 
two. These numerical approximations can now be used for extrapolation. We take 
the harmonic sequence (9.8’), put 

T a =S h (x 0 + H), T' a = S' h (x 0 + H) 

and compute the extrapolated expressions T- ■ and T[ - by the Aitken & Neville 
formula (9.10). 


Remark. Eliminating the y'■ - values in (14.31b) we obtain the equivalent formula 

Vi +2 - 2 Vi + Vi-2 = ( 2h ) 2 f( x i, Vi), (14.32) 

which is often called Stormer’s rule. For the implementation the formulation 
(14.31b) is to be preferred, since it is more stable with respect to round-off errors 
(see Section III. 10). 


Dense output. As for the derivation of Section II.9 for the GBS algorithm we 
shall do Hermite interpolation based on derivatives of the solution at x 0 , x 0 + H 
and Xq + H/2. At the endpoints of the considered interval we have y Q , y f 0 , = 
f(x 0 , y 0 ) and y 1 , y[, y" at our disposal. The derivatives at the midpoint can be 
obtained by extrapolation of suitable differences of function values. However, one 
has to take care of the fact that y i and /(x-,t/-) are available only for even in¬ 
dices, whereas y[ is available for odd indices only. For the same reason as for 
the GBS method, the step number sequence has to satisfy (9.34). For notational 
convenience, the following description is restricted to the sequence (9.35). 

We suppose that T kk and T kk are accepted approximations to the solution. 
Then the construction of a dense output formula can be summarized as follows: 


Step 1. For each j E {1,. .., k} compute the approximations to the derivatives of 
y(x) at x 0 + H/2 by (5 is the central difference operator): 


yi 1 ) _ 

d j -Vnj/2^ 


d j ^ — 2 i^ n j / 2 ~ 1 Jrdn o/ 2+1) ’ 

yM _ I._I_ (s K ~ 2 f( J ) 

j 2 (2h j ) K - 2 \ Jnj/2+iJ’ 

Xk -2 f O’) 

° Jnj/2 


k = 2,4,..., 2 j y 


d (K) = 

J (2 ft.)«- 2 


k — 3, 5,..., 2j + 1. 


(14.33) 
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Step 2. Extrapolate d!f\ d^ (k — 1) times and d(k — t) times to 
obtain improved approximations d^ to yW(x 0 +H/ 2). 

Step 3. For given p ( — 1 < p < 2k+ 1) define the polynomial P^ifi) of degree 
/i + 6 by 

^(°) = Vo, ^(0) = y'o, P';{ 0) = f(x 0 , y 0 ) 

P^)=T kk , P'^)=T' kk , P';(l) = f(x 0 + H,T kk ) (14.34) 

Pjf\l/2) = H K d^ for k = 0,1 

Since T kk , T' kk are the initial values for the next step, the dense output obtained 
by the above algorithm is a global C 2 approximation to the solution. It satisfies 

y(x 0 + OH) - P^O) i 0(H 2k ) if p>2k-7 (14.35) 

(compare Theorem 9.5). In the code ODEX2 of the Appendix the value p = 2k — 5 
is suggested as standard choice. 


Problems for Numerical Comparisons 


PLEI — the celestial mechanics problem (10.3) which is the only problem of Sec¬ 
tion II. 10 already in the special form (14.7). 


ARES — the AREnstorf orbit in Second order form (14.7). This is the restricted 
three body problem (0.1) with initial values (0.2) integrated over one period 0 < 
x < £ end (see Fig. 0.1) in a fixed coordinate system. Then the equations of motion 
become 


,./“i(*)-0i , .. h i( x )-vi 
Vl i\ D, 

&= V ! a * w-vi+uhto-y* 


D 1 


D 0 


(14.36) 


where 

D i = (( Vi-ai(x)) 2 + (y 2 -a 2 (x)) 2 ) 3/2 , D 2 = (fa-b^x)) 2 + {y 2 -b 2 {x)) 2 f /2 
and the movement of sun and moon are described by 

a 1 (x) = —p cosx a 2 (x) = — p sinx b ± (x) = p! cos x b 2 (x) = p r sinx. 
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The initial values 

2 /!(0) =0.994, y[( 0) = 0 , y 2 { 0) = 0 , 

2/2 (0) = -2.00158510637908252240537862224 + 0.994, 
x end = 17.0652165601579625588917206249, 

are those of (0.2) enlarged by the speed of the rotation. The exact solution values 
are the initial values transformed by the rotation of the coordinate system. 

CPEN — the nonlinear Coupled PENdulum (see Fig. 14.3). 



Fig. 14.3. Coupled pendulum 


The kinetic as well as potential energies 

T= m i l \f\ 

2 + 2 


V = —mn 1 l 1 cos cp 1 — m 2 l 2 cos + 2 + 


c 0 r 2 (sin 

2 


sin </? 2 ) 2 


lead by Lagrange theory (equations (1.6.21)) to 

sine?, c n r 2 . N . 

=-y-72 ( sm +i - sm +2) cos +1 + / W 

L 777 /^ L 

sin <p 2 c 0 r 2 . . . 

¥>2 =-7-+ ( Sm ^2 ~ Sm <Pl) cos ^2' 

6 2 m 2 + 

We choose the parameters 


(14.37) 


h= l 2 = l i m i = 1 , 


m 2 = 0.99, r = 0.1, c o = 0.01, £ e nd = 496 


and all initial values and speeds for t = 0 equal to zero. The first pendulum is then 
pushed into movement by a (somewhat idealized) hammer as 


f(t) = 



if | i — 11 < 1; 
otherwise. 


The resulting solutions are displayed in Fig. 14.4. The nonlinearities in this prob¬ 
lem produce quite different sausages (cf. “Mon Oncle” de Jacques Tati 1958) from 
those people are accustomed to from linear problems (cf. Sommerfeld 1942, §20). 
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Fig. 14.4. Movement of the coupled pendulum (14.37) 


WPLT — the Weak PLaTe, i.e., the PLATE problem of Section IV. 10 (see Volume 
II) with weakened stiffness. We use precisely the same equations as (IV. 10.6) and 
reduce the stiffness parameter a from a = 100tocr = l/16. We also remove the 
friction (u = 0 instead of uj = 1000) so that the problem becomes purely of second 
order. It is linear, nonautonomous, and of dimension 40. 

Performance of the Codes 

Several codes were applied to each of the above four problems with 89 different 
tolerances between Tol = 10 -3 and Tol = 10 -14 (exactly as in Section II. 10). The 
number of function evaluations (Fig. 14.5) and the computer time (Fig. 14.6) on a 
Sun Workstation (SunBlade 100) are plotted as a function of the global error at the 
endpoint of the integration interval. The codes used are the following: 

RKN6 — symbol — is the low order option of the Runge-Kutta-Nystrom code 
presented in Brankin, Gladwell, Dormand, Prince & Seward (1989). It is based 
on a fixed-order embedded Nystrom method of order 6(4), whose coefficients are 
given in Dormand & Prince (1987). This code is provided with a dense output. 

RKN12 — symbol E — is the high order option of the Runge-Kutta-Nystrom 
code presented in Brankin & al. (1989). It is based on the method of order 12(10), 
whose coefficients are given in Dormand, El-Mikkawy & Prince (1987). This code 
is not equipped with a dense output. 

ODEX2 — symbol O — is the extrapolation code based on formula (14.31a,b,c’) 
and uses the harmonic step number sequence (see Appendix). It is implemented 
in the same way as ODEX (the extrapolation code for first order differential equa¬ 
tions). In particular, the order and step size strategy is that of Section II.9. A dense 
output is available. Similar results are obtained by the code DIFEX2 of Deuflhard 
& Bauer (see Deuflhard 1985). 

In order to demonstrate the superiority of the special methods for y" = f(x,y), we 
have included the results obtained by DOP853 (symbol O ) and ODEX (symbol 
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A) which were already described in Section II. 10. For their application we had to 
rewrite the four problems as a first order system by introducing the first derivatives 
as new variables. The code ODEX2 is nearly twice as efficient as ODEX which is in 
agreement with the theoretical considerations. Similarly the Runge-Kutta-Nystrom 
codes RKN6 and RKN12 are a real improvement over DOP853. 

A comparison of Fig. 14.5 and 14.6 shows a significant difference. The ex¬ 
trapolation codes ODEX and ODEX2 are relatively better on the “time”-pictures 
than for the function evaluation counts. With the exception of problem WPLT the 
performance of the code ODEX2 then becomes comparable to that of RKN12. As 
can be observed especially at the WPLT problem, the code RKN12 overshoots, for 
stringent tolerances, significantly the desired precision. It becomes less efficient if 
Tol is chosen too close to Uround. 
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Exercises 

1. Verify that the methods of Table 14.2 are of order 4 and 5, respectively. 

2. The error coefficients of a pth order Nystrom method are defined by 

e(t) = 1 - (g(t) + 1)7(<) Ei *>»$»(*) 

e'(<) = l-7W EA$i(<) 

a) The assumption (14.26) implies that 

e(t) = —g{t)e'(u) 

where u is the N-tree obtained from t by adding a branch with a meagre 
vertex to the root of t . 


for g(t) = p, 
for g{t) =p + 1. 


(14.38) 


for e(t) = p, 
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b) Compute the error coefficients of Nystrom’s method (Table 14.1) and com¬ 
pare them to those of the classical Runge-Kutta method. 

3. Show that the order conditions for Runge-Kutta methods (Theorem 2.13) are a 
subset of the conditions (14.21). They correspond to the N-trees, all of whose 
vertices are fat. 


4. Sometimes the definition of order of Nystrom methods (14.8) is relaxed to 


y(x 0 + h)-y 1 = 0{h p+1 ) 
y\x o + h)-y[ = 0(h p ) 


(14.39) 


(see Nystrom 1925). Show that the conditions (14.39) are not sufficient to 
obtain global convergence of order p . 


Hint. Investigate the asymptotic expansion of the global error with the help of 
Theorem 8.1 and formula (8.8). 


5. The numerical solutions T kk and T kk of the extrapolation method of this sec¬ 
tion are equivalent to a Nystrom method of order p —2k with s = p 2 /8 + 
p/4 + 1 stages. 


6. A collocation method for y" = f(x, y, y') (or y" = f(x,y)) can be defined as 
follows: let u(x) be a polynomial of degree 5 + 1 defined by 

w (^o) = 2/o> u \ x o) = y'o (14.40) 

u"(x Q -\ -eft) = /(xQ + Cih, u(x 0 + c i h),u\x 0 + c i /i)), * = 1 ,..., s, 

then the numerical solution is given by y 1 = u(x 0 + h ), y[= u'(x 0 + h ). 


a) 


Prove that this collocation method is equivalent to the Nystrom method 
(14.4) where 


a ij = fo’ £ j W dt > a ij = fo’( c i - Wj dt ’ 

b i = fo £ i(t) dt > b i = fo( 1 ~ t) £ i(t) dt, 


(14.41) 


and i-(t) are the Lagrange polynomials of (7.17). 

b) The a- satisfy C(s) (see Theorem 7.8) and the a- satisfy (14.28) for 
</ = 0,l,...,s — 1. These equations uniquely define a- and a- . 

c) In general, a- and a- do not satisfy (14.5). 

d) If M(t) = ni=i i s orthogonal to all polynomials of degree r — 1, 

M(t)t q ~ 1 dt = 0, q = V.., r, 

then the collocation method (14.40) has order p = s + r . 


e) The polynomial u(x) yields an approximation to the solution y{pc) on the 
whole interval [x 0 , x 0 + h\. The following estimates hold: 

y(x) — u(x) = 0(h s+2 ), y\x) ~ u'(x) = 0(h s+1 ). 
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Divide ut regnes (N. Machiavelli 1469-1527) 


In the previous section we considered direct methods for second order differen¬ 
tial equations y" = f(y , y '). The idea was to write the equation as a partitioned 
differential system 


y 

y' 


inly')) 


(15.1) 


and to discretize the two components, y and y r , by different formulas. There are 
many other situations where the problem possesses a natural partitioning. Typical 
examples are the Hamiltonian equations (1.6.26,1.14.26) and singular perturbation 
problems (see Chapter VI of Volume II). It may also be of interest to separate 
linear and nonlinear parts or the “non-stiff” and “stiff” components of a differential 
equation. 

We suppose that the differential system is partitioned as 


(Va\ = ( fafoa’Vb)) 

\y b J Khiy^Vb)) 


(15.2) 


where the solution vector is separated into two components y a , y b , each of which 
may itself be a vector. An extension to more components is straight-forward. 

For the numerical solution of (15.2) we consider the partitioned method 


K = fa (Va0 + h J2 a ij k V % 0 + h J2 “A') 

3 = 1 3 = 1 

s s 

l i = fb (VaO + h J2 a i0 k 3 -%0 + ^E “A') 

3 = 1 3 = 1 

s s 

y a i = y a o + h J2 b i k i’ %i = %o + ^EVi 


(15.3) 






where the coefficients a -, b • and a -, b i represent two different Runge-Kutta 
schemes. The first methods of this type are due to Hofer (1976) and Griepentrog 
(1978) who apply an explicit method to the nonstiff part and an implicit method 
to the stiff part of a differential equation. Later Rentrop (1985) modified this idea 
by combining explicit Runge-Kutta methods with Rosenbrock-type methods (Sec- 
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tion IV.7). Recent interest for partitioned methods came up when solving Hamilto¬ 
nian systems (see Section 11.16 below). 

The subject of this section is the derivation of the order conditions for method 
(15.3). For order p it is necessary that each of the two Runge-Kutta schemes 
under consideration be of order p. This can be seen by applying the method to 
Pa = faiVa ) ’ v'b = /&(%) • But this is not sufficient, the coefficients have to satisfy 
certain coupling conditions. In order to understand this, we first look at the deriva¬ 
tives of the exact solution of (15.2). Then we generalize the theory of B-series (see 
Section 11.12) to the new situation (Hairer 1981) and derive the order conditions in 
the same way as in 11.12 for Runge-Kutta methods. 


Derivatives of the Exact Solution, P-Trees 


In order to avoid sums and unnecessary indices we assume that y a and y b in (15.2) 
are scalar quantities. All subsequent formulas remain valid for vectors if the deriva¬ 
tives are interpreted as multi-linear mappings. Differentiating (15.2) and inserting 
(15.2) again for the derivatives we obtain for the first component y a 


vP = fa 

( 2 ) = 9[a 
Va dy a 


9 Ja 

9y b 


h 


(15.4;1) 

(15.4;2) 

(15.4;3) 


(3) _ d 2 /q ,J. + + + ——A 

Va dy 2 U «« la) + Qy^ U6, la) + ^ ^ la + ^ lb 

, &fa (f n , 9Pa (f n + « f + d J^d_h 

dy a dy b h) + dyl Ub ’ h> + dy b dy a Ja + dy b dy b h ' 

Similar formulas hold for the derivatives of y b . 

For a graphical representation of these formulas we need two different kinds 
of vertices. As in Section 11.14 we use “meagre” and “fat” vertices, which will 
correspond to f a and f b , respectively. Formulas (15.4) can then be represented as 
shown in Fig. 15.1. 


V 

J 


l k \^P l 


\ 

J 


kq 


(15.4;1) 

(15.4;2) 


k( \/ 1 k( \^p l kfec (f (15.4;3) 
J J V V 


Fig. 15.1. The derivatives of the exact solution y a 
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Definition 15.1. A labelled P-tree of order q is a labelled tree (see Definition 2.2) 

t '■ A q \ 0 '} ->• A q 

together with a mapping 

t' : A q ^ {“meagre”, “fat”}. 

We denote by LTP® the set of those labelled P-trees of order q, whose root is 
meagre (i.e., t'(j) = “meagre”). Similarly, LTP£ is the set of qth order labelled 
P-trees with a “fat” root. 

Due to the symmetry of the second derivative the 2nd and 5th expressions in 
(15.4;3) are equal. We therefore define: 

Definition 15.2. Two labelled P-trees (t , t') and (it, u') are equivalent , if they 
have the same order, say q , and if there exists a bijection a : A q —> A q such that 
a(j) m j and the following diagram commutes: 

A q \{j} 

a 

A q \{j} 

Definition 15.3. An equivalence class of qih order labelled P-trees is called a P- 
tree of order q . The set of all P-trees of order q with a meagre root is denoted by 
TP , that with a fat root by TP}j . For a P-tree t we denote by g{t) the order of 
t , and by a(t) the number of elements in the equivalence class t. 

Examples of P-trees together with the numbers g{t) and a(t) are given in 
Table 15.1 below. We first discuss a recursive representation of P-trees (extension 
of Definition 2.12), which is fundamental for the following theory. 

Definition 15.4. Let t 1: ... . t rn be P-trees. We then denote by 

/ J'l.'ml ( 15 - 5 ) 

the unique P-tree t such that the root is “meagre” and the P-trees t x ,..., t m remain 
if the root and the adjacent branches are chopped off. Similarly, we denote by 
b [t 1? ..., t m \ the P-tree whose new root is “fat” (see Fig. 15.2). We further denote 
by r a and r b the meagre and fat P-trees of order one. 

Our next aim is to make precise the connection between P-trees and the expres¬ 
sions of the formulas (15.4). For this we use the notation 

a if the root of t is meagre, 

b if the root of t is fat. 




(15.6) 
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h h h t ~ a&i’t 2>h] t — 2 ^ 3 ] 

Fig. 15.2. Recursive definition of P-trees 


Definition 15.5. The elementary differentials , corresponding to (15.2), are defined 
recursively by (y = (y a , y b )) 

F ( T a)(y) = F ( T b)(.y) = fb(y) 

and 

F (t)(y) = 7b- a, • • • • > nt m ){y)) 

a Vw(t{) ■ ■■ a yw(t rn ) 
for t= a [t or t = 

Elementary differentials for P-trees up to order 3 are given explicitly in Ta¬ 
ble 15.1. 

We now return to the starting-point of this section and continue the differen¬ 
tiation of formulas (15.4). Using the notation of labelled P-trees, one sees that a 
differentiation of F(t)(y a , y b ) can be interpreted as an addition of a new branch 
with a meagre or fat vertex and a new summation letter to each vertex of the la¬ 
belled P-tree t. In the same way as we proved Theorem 2.6 for non-partitioned 
differential equations, we arrive at 

Theorem 15.6. The derivatives of the exact solution of (15.2) satisfy 

y ( a q) = 52 F W(^a>%)= 52 (15.4;q) 

teLTP* teTPf 

y ( b q) = 52 F ( t )(y^yb)= 52 a W F W(%>%)- 

teLTP% teTP% 


□ 
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Table 15.1. P -trees and their elementary differentials 


P-tree 

repr. ( 15 . 5 ) 

e(t) 

a(t) 

elem. differential 

%(0 

• 

T a 

1 

1 

fa 

1 

\ 

a \Ta\ 

2 

1 

d fa f 

dy a 

^ 2 k a jk 

\ 

a\T h \ 

2 

1 

dfa f 

dy b Jb 

a 3 k 

V 

a[Ta, T a ] 

3 

1 

Qy2 ( fa , fa) 

^ 2 kf a jk a jl 

V 3 

a[Ta, T b ] 

3 

2 

eU a y b (f^fb) 

^ 2 k,l a jk a jl 

V 

a \n 5 T^] 

3 

1 

^f(fbJb) 

Efc,Z a jk a jl 

< 

a[a[ 7 "a]] 

3 

1 

dfg d f a n 
dy a dy a J a 

J 2 k,l a fk a kl 

< 

a[a[Tbi] 

3 

1 

dfa df a f 
dy a dy b ->b 

a jk a kl 

< 

a[ b [Ta]\ 

3 

1 

d fa df b r 

Pyb~Ppb^ a 

1 ~ 2 k,l a jk a kl 

< 

a[ b \TbW 

3 

1 

ih d Ph 

dy b dy b J o 

'Yhk,l a jk a kl 

o 

Tb 

1 

1 

fb 

1 

\) 

b[Ta] 

2 

1 

dfb f 
dy a 

J 2 k a jk 

% 

b\n\ 

2 

1 

df b A 

chpJb 

S/c a jk 


P-Series 


In Section 11.12 we saw the importance of the key-lemma Corollary 12.7 for the 
derivation of the order conditions for Runge-Kutta methods. Therefore we extend 
this result also to partitioned ordinary differential equations. 

It is convenient to introduce two new P-trees of order 0, namely 0 a and 0 6 . 
The corresponding elementary differentials are F($ a )(y) m y a and P(0 6 ) (y) = y b . 
We further set 


LTP a = {0J U PTPf U LTP£ U ... 

LTpb = {0 J u LTP\ U LTP\ U .... 

(15.7) 


Tpa = {0J u TPf U PP 2 a U ... 
Tpb = {0 J u TPb U TP| U ... 
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Definition 15.7. Let c(0 a ), c(0 6 ), c(r a ), c(r b ),... be real coefficients defined 
for all P-trees, i.e., c : TP a U TP h —> R. The series 

P{c,y)= (P a {c,y),P b {c,y)) T 

where 

^a(c,2/) = E T77v^W P W(2/), A(c,J/)= E ^ C (W)(2/) 

teLTP* ^ ' m tELTP b ^ 

is then called a P-series. 


Theorem 15.6 simply states that the exact solution of (15.2) is a P-series 

{v a ( x 0 + %( X 0 + M) T = -P(y- (2/aOo)> Vb( x o))) ( 15 - 8 ) 

with y(t) = 1 for all P-trees t. 


Theorem 15.8. Let c : TP a U TP b —> R be a sequence of coefficients such that 
C(0J = c (0 b ) = 1. Then 


with 


u ( fa( P ( C ’ (»«,%))) 

\/&C p ( c >(2/a>2/&))) 


p ( c ',(y a ,y b )) 


(15.9) 


c'(0 a ) = c'(0 6 )=O, c , (T a )=c , (r ft ) = l (15.10) 

c'(i) = ^(t)c(i 1 )...c(i m ) if t = a [i l5 ...,iJ or t = b [i 1; ..., t J. 


The proof is related to that of Theorem 12.6. It is given with more details in 
Hairer (1981). □ 


Order Conditions for Partitioned Runge-Kutta Methods 

With the help of Theorem 15.8 the order conditions for method (15.3) can readily 
be obtained. For this we denote the arguments in (15.3) by 

s s 

9i = y a o + h 'P a ij k j’ 9i = y&o + ft E%'^> (i5.il) 

3 = 1 J = 1 

and we assume that G i = (ft,^) T and FT- = h(k i ,£ i ) T are P-series with coeffi¬ 
cients G • (t) and (t), respectively. The formulas (15.11) then yield G i (0 a ) = 1, 
G-(0 6 ) = 1 and 

J i a ij K j (t) the root of t is mea £ re > 

z ^ \ Ylj=i Q'ij'Kjfb) if the root of t is fat. 


(15.12) 
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Application of Theorem 15.8 to the relations k- = f a (Gj ), = f b {G-) shows that 
K \j(t) = Gj(t) which, together with (15.10) and (15.12), recursively defines the 
values Kj (t ). 

It is usual to write K • (t) = 7 (t) 4^ (t) where 7 (t) is the integer given in Defini¬ 
tion 2.10 (see also (2.17)). The coefficient 4T(t) is then obtained in the same way 
as the corresponding value of standard Runge-Kutta methods (see Definition 2.9) 
with the exception that a factor a ik has to be replaced by a ik , if the vertex with la¬ 
bel “fc” is fat. A comparison of the P-series for the numerical solution (y la , ?/ 16 ) T 
with that for the exact solution (15.8) yields the desired order conditions. 


Theorem 15.9. A partitioned Runge-Kutta method (15.3) is of order p iff 


s 


3 =1 


1 

7 (t) 


and 


£ vm <>=7 


3 =1 


(15.13) 


for all P-trees of order < p. 


□ 


Example. A partitioned method (15.3) is of order 2, if and only if each of the two 
Runge-Kutta schemes has order 2 and if the coupling conditions 

bi a ij = 2 ’ 5^ ^i a i3 = 2 ’ 

ij i,j 

which correspond to trees a [r b \ and b [r a \ of Table 15.1 respectively, are satisfied. 
This happens if 

— c i for all i . 

This last assumption simplifies the order conditions considerably (the “thickness” 
of terminating vertices then has no influence). The resulting conditions for order 
up to 4 have been tabulated by Griepentrog (1978). 


Further Applications of P-Series 


Runge-Kutta methods violating (1.9). For the non-autonomous differential equa¬ 
tion y' = f(x,y) we consider, as in Exercise 6 of Section II. 1, the Runge-Kutta 
method 

s s 

K = f{ x o + c i h,y 0 + hJ2 a ij k f yi=y 0 + h 52 b i k i’ (15 - 14) 

j =1 i =1 
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where c- is not necessarily equal to c- = J2j a ij • Therefore, the x and y compo¬ 
nents in 


y' = f(x,y) 

x' = 1. 


(15.15) 


are integrated differently. This system is of the form (15.2), if we put y a = y, y h = 
faiVa’Vb ) = f( x >y) and fbly^Vb) = !• Since fb is constant, all elementary 
differentials that involve derivatives of f b vanish identically. Thus, P-trees where 
at least one fat vertex is not an end-vertex need not be considered. It remains to 
treat the set 

T x = {t G TP a ; all fat vertices are end-vertices}. (15.16) 

Each tree of T x gives rise to an order condition which is exactly that of Theorem 
15.9. It is obtained in the usual way (Section II.2) with the exception that c k has 
to be replaced by c k , if the corresponding vertex is a fat one. 


Fehlberg methods. The methods of Fehlberg, introduced in Section 11.13, are 
equivalent to (15.14). However, it is known that the exact solution of the differ¬ 
ential equation y' — f(x,y) satisfies y(x 0 ) = 0, y'(x 0 ) = 0 ,..., y^ rn \x^) = 0 at 
the initial value x = x 0 . As explained in 11.13, this implies that the expressions 
/, df/dx,..., <9 m_1 // dx 171 - 1 vanish at (x 0 , y 0 ) and consequently also many of 
the elementary differentials disappear. The elements of T x which remain to be 
considered are given in Fig. 15.3. 


m 


v 




Fig. 15.3. P -trees for the methods of Fehlberg 


Nystrom methods. As a last application of Theorem 15.8 we present a new deriva¬ 
tion of the order conditions for Nystrom methods (Section 11.14). The second order 
differential equation y" = f(y, y') can be written in partitioned form as 


y 

y f 


UA 


(15.17) 


In the notation of (15.2) we have y a = y,y b = y>, f a (y a , y b ) = y b , f b (y a , y b ) = 
f(y a ,y h ). The special structure of f a implies that only P-trees which satisfy the 
condition (see Definition 14.2) 


“meagre vertices have at most one son and this son has to be fat” (15.18) 
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have to be considered. The essential P-trees are thus 

TN* = {t G TP* ; t satisfies (15.18)} 

TN h q = {t G TP £ ; t satisfies (15.18)}. 

It follows that each element of TN q+1 can be written as t = a [ix] with it G TiV^. 
This implies a one-to-one correspondence between T7V“ +1 and TNfj, leaving the 
elementary differentials invariant: 

r( a M)(2/a>%) = ^ - F ( u )(y a ,yb) = F ( u )(ya>yb)- 

From this property it follows that 

hp b{ c ,(y a >yb)) = p a ( c 'Ay^yb)) ( 15 . 19 ) 

where c'(0 a ) = 0, c'(r a ) = c(0 b ) and 

c'(t) = g(t)c(u) if ^= a [ti]. (15.20) 

This notation is in agreement with (15.10). 

The order conditions of method (14.13) can now be derived as follows: assume 
g i , g[ to be P-series 

9i = P a { C i > (%> y'o )) > 9i = P b( C ii (%> Vo))- 

Theorem 15.8 then implies that 

hHg i ,g' i ) = P b {4,(y Q ,y' 0 )). (15.21) 

Multiplying this relation by h it follows from (15.19) that 

h 2 f(g i ,g' i ) = P a (c / ',(y 0 ,y' 0 )). (15.22) 

Here c" = (c')', i.e., 

c "(t) =0 for t = % a and t = r a , c"(„|r ( , ) = 1, 

c"(t) = e(t){e{t)-i)c i (t 1 )...c i {t m ) if t= a [ b [t 1 ,...,t m }}. 

The relations (15.21) and (15.22), when inserted into (14.13), yield 

C i( T a) = C i> 

{ ^2j aijCj (t) if the root of t is meagre, 

CLijCj (t) if the root of t is fat. 

Finally, a comparison of the P-series for the exact and numerical solutions gives 
the order conditions (for order p ) 

c "(i) = 1 for t £ TN a q , q = 2,...,p 

i 

J2 b i C iW = 1 


for teTN b q , q = l,...,p. 


(15.23) 
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Exercises 

1. Denote the number of elements of TP (P-trees with meagre root of order q ) 
by a q (see Table 15.2). Prove that 

a 1 + a 2 x + a 3 x 2 + ... = (1 — x)~ 2ai (1 — x 2 )~ 2cX2 (1 — x 3 )~ 2a3 

Compute the first a q and compare them with the a q of Table 2.1. 


Table 15.2. Number of elements of TPq 


Q 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Oiq 

1 

2 

7 

26 

107 

458 

2058 

9498 

44947 

216598 


2. There is no explicit, 4-stage Runge-Kutta method of order 4, which does not 
satisfy condition (1.9). 

Hint. Use the techniques of the proof of Lemma 1.4. 

3. Show that the order conditions (15.23) are the same as those given in Theorem 
14.10. 

4. Show that the partitioned method of Griepentrog (1978) 


0 

a ij 

0 

0 


a ij 

1/2 

1/2 

1/2 

-13/2 

(l+/3)/2 


1 

-1 2 

1 

(3+ 5/?)/2 

-d + 3 P) 

(l+/?)/2 


1/6 2/3 1/6 

1/6 

2/3 

1/6 


with f3 = y/3/3 is of order 3 (the implicit method to the right is A -stable and 
is provided for the stiff part of the problem). 
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It is natural to look forward to those discrete systems which pre¬ 
serve as much as possible the intrinsic properties of the continu¬ 
ous system. (Feng Kang 1985) 

Y.V. Rakitskii proposed ... a requirement of the most complete 
conformity between two dynamical systems: one resulting from 
the original differential equations and the other resulting from the 
difference equations of the computational method. 

(Y.B. Suris 1989) 


Hamiltonian systems, given by 

dH i ^ • 9H 

Pi = —Q^(P, q ), qi=g^(p,q), (16.1) 

have been seen to possess two remarkable properties: 

a) the solutions preserve the Hamiltonian H(p , q) (Ex. 5 of Section 1.6); 

b) the corresponding flow is symplectic, i.e., preserves the differential 2-form 

n 

u 2 = '^dp i Adq i (16.2) 

i=l 

(see Theorem 1.14.12). In particular, the flow is volume preserving. 

Both properties are usually destroyed by a numerical method applied to (16.1). 

After some pioneering papers (de Vogelaere 1956, Ruth 1983, and Feng Kang 
cm 1985) an enormous avalanche of research started around 1988 on the char¬ 
acterization of existing numerical methods which preserve symplecticity or on the 
construction of new classes of symplectic methods. An excellent overview is pre¬ 
sented by Sanz-Serna (1992). 

Example 16.1. We consider the harmonic oscillator 

H(p,q) = 7 j(p 2 + k 2 q 2 ). (16.3) 

Here (16.1) becomes 

p = —k 2 q , q = P (16.4) 


and we study the action of several steps of a numerical method on a well-known 
set of initial data (p 0 , q 0 ) (see Fig. 16.1): 

a) The explicit Euler method (1.7.3) 


-hk 2 \ 


Pm-1 

q m ~ i 


h = s = 


..,16; (16.5a) 
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b) the implicit (or backward) Euler method (7.3) 
Prn \ = 1 ( 1 ~hk 2 \ (\ 

q m ) 1 + h 2 k 2 \h 1 J \q m _ 1 y ’ 


8k’ 


m ■ 


1, - - ,16; 
(16.5b) 


c) Runge’s method (1.4) of order 2 




h=—,m= 1, 
4 A; 


.., 8 ; 

(16.5c) 


d) the implicit midpoint rule (7.4) of order 2 


1 


1 + 


h 2 k 2 


1 - 


fr 2 fc 2 

4 


— hk 2 

1- 


4 


Pm — 1 
Qm -1 


ft=— , m= 1 ,.. 
4/c 


(16.5d) 

For the exact flow, the last of all these cats would precisely coincide with the 
first one and all cats would have the same area. Only the last method appears to be 
area preserving. It also preserves the Hamiltonian in this example. 



Fig. 16.1. Destruction of symplecticity of a Hamiltonian flow, k = (v^ + l)/2 


Example 16.2. For a nonlinear problem we choose 

rf(p,«) = y- cos(g)(l-|) (16.6) 

which is similar to the Hamiltonian of the pendulum (1.14.25), but with some of 
the pendulum’s symmetry destroyed. Fig. 16.2 presents 12000 consecutive solution 
values for 
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a) Runge’s method of order 2 (see (1.4)); 

b) the implicit Radau method with 5 = 2 and order 3 (see Exercise 6 of Section 
H.7); 


c) the implicit midpoint rule (7.4) of order 2. 
The initial values are 


Po = °> 


J arccos(0.5) = tt/3 
\ arccos(—0.8) 


The computation is done with fixed step sizes 


for case (a) 

for cases (b) and (c). 


, _ f 0.15 for case (a) 

(0.3 for cases (b) and (c). 

The solution of method (a) spirals out, that of method (b) spirals in and both by 
no means preserve the Hamiltonian. Method (c) behaves differently. Although the 
Hamiltonian is not precisely preserved (see picture (d)), its error remains bounded 
for long-scale computations. 




Fig. 16.2. A nonlinear pendulum and behaviour of H 
(• ... indicates the initial position) 
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Symplectic Runge-Kutta Methods 


For a given Hamiltonian system (16.1), for a chosen one-step method (in particular 
a Runge-Kutta method) and a chosen step size h we denote by 


ip h : R 2n —¥ R 2ra 

(Po’%) 1 —» iPi,Qi) 

the transformation defined by the method. 


(16.7) 


Remark. For implicit methods the numerical solution (p 1 , q 1 ) need not exist for all 
h and all initial values (p Q , q Q ) nor need it be uniquely determined (see Exercise 2). 
Therefore we usually will have to restrict the domain where f> h is defined and we 
will have to select a solution of the nonlinear system such that f> h is differentiable 
on this domain. The subsequent results hold for all possible choices of h . 

Definition 16.4. A one-step method is called symplectic if for every smooth Hamil¬ 
tonian H and for every step size h the mapping f> h is symplectic (see Definition 
1.14.11), i.e., preserves the differential 2-form uS 2 of (16.2). 


We start with the easiest result. 


Theorem 16.5. The implicit s -stage Gauss methods of order 2s (Kuntzmann & 
Butcher methods of Section II.7) are symplectic for all s. 

Proof We simplify the notation by putting h = 1 and 1 0 = 0 and use the fact that the 
methods under consideration are collocation methods, i.e., the numerical solution 
after one step is defined by (n(l), i;(l)) where (u(t),v(t)) are polynomials of 
degree s such that 

ft Tf 

u(0)=p 0 , u\c i ) = - — (u{c i ),v{c i )) 

i = l,...,s. (16.8) 

u( 0 )= 4 0 , v '{c i ) = —(u(c i ),v(c i )) 

The polynomials u{t) and v{t) are now considered as functions of the initial val¬ 
ues. For arbitrary variations and °f the initial point we denote the corre¬ 
sponding variations of u and v as 

t _ d(u(t),v(t)) o t _ o 

4l d(p 0 ,q Q ) 42 d(p 0 , % ) ' 42 ' 

Symplecticity of the method means that the expression 

w2 (£i>£2) -v 2 ((i>(°) = J o ^j.u 2 (£{,£\)dt (16.9) 

should vanish. Since and are polynomials in t of degree s , the expression 
^ uj 2 (^, ££) is a polynomial of degree 2,s — 1 . We can thus exactly integrate (16.9) 
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by the Gaussian quadrature formula and so obtain 


- 2 <d,d)-^«?,®=£*4- 2 Ki,d) 

i—1 


t=Ci 


(16.9’) 


Differentiation of (16.8) with respect to (p 0 , q 0 ) shows that (£*,£ 2 ) satisfies the 
variational equation (1.14.27) at the collocation points t = c i: i = 1,..., s . There¬ 
fore, the computations of the proof of Theorem 1.14.12 imply that 


d 

dt 


^ 2 (M) 



t=Ci 


for i = 1 ,..., s. 


(16.10) 


This, introduced into (16.9’), completes the proof of symplecticity. □ 


The following theorem, discovered independently by at least three authors 
(F. Lasagni 1988, J.M. Sanz-Serna 1988, Y.B. Suris 1989) characterizes the class 
of all symplectic Runge-Kutta methods: 

Theorem 16.6. If the s x s matrix M with elements 

m ij = b i a ij+ b j a ji~b i bj, = (16.11) 

satisfies M = 0, then the Runge-Kutta method (7.7) is symplectic. 

Proof. The matrix M has been known from nonlinear stability theory for many 
years (see Theorem IV. 12.4). Both theorems have very similar proofs, the one 
works with the inner product, the other with the exterior product. 

We write method (7.7) applied to problem (16.1) as 


P i=Po + h J2 a i: k J 

j 

Q l = qo + h J2 a */i 

j 

(16.12a) 

pi = + 

qi = q 0 + h J2 b i e i 

(16.12b) 


r) PT 

l * = W {Pi ’ Qi) ’ 

(16.12c) 


denote the J th component of a vector by an upper index J and introduce the linear 
maps (one-forms) 

dp{ : R 2n -> R , dPf : R 2n R , 

M c c, ; 0P > c (16.13) 

d{Po,Qo) d(Po,Qo) 

and similarly also dpi , dkf, dq^, dq(, dQ (, d£f (the one-forms dpi and 
dqi correspond to dpj and dqj of Section 1.14). Using the notation (16.13), 
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symplecticity of the method is equivalent to 

n n 

53 d Pi Adc t'i = ^2 dp o A d 1o ■ (16.14) 

J=1 J =1 

To check this relation we differentiate (16.12) with respect to the initial values and 
obtain 


dki 


dti 


E 

L= 1 


E 


d 2 H 
dq J dp L * 

d 2 H 


dp J dp L 


L =1 

We now compute 

A A A 


dQl = dq J Q + h 53 a ij d ^j 

j 

(16.15a) 

drf = d q ' + h'£b i dti 

(16.15b) 

) « p t «,)'«? 

(16.15c) 

iP . L + pAA (P " Qi) " IQ ‘' 

(16.15d) 


(16.16) 


dp'l A dq( — dpQ A dq g 

= A b i dp^ A cK/ + h b i dk / A <7(?q + d 2 bfi^ dk / A 


by using (16.15b) and the multilinearity of the wedge product. This formula corre¬ 
sponds precisely to (IV. 12.6). Exactly as in the proof of Theorem IV. 12.5, we now 
eliminate in (16.16) the quantities dp^ and dq q with the help of (16.15a) to obtain 

dp( A dq( — dpQ A dq^ (16.17) 

= h b i dP x J A d£{ J rh^ j b i dk] A dQ{ - /i 2 ^ m ij dk i A Mj , 

% i i,j 


the formula analogous to (IV. 12.7). Equations (16.15c,d) are perfect analogues of 
the variational equation (1.14.27). Therefore the same computations as in (1.14.39) 
give 

n n 

53 d p / a +53 dk f a d Qi = ° (i6.i8) 

j =i j= i 


and the first two terms in (16.17) disappear. The last term vanishes by hypothesis 
(16.11) and we obtain (16.14). □ 


Remark. F. Lasagni (1990) has proved in an unpublished manuscript that for irre¬ 
ducible methods (see Definitions IV. 12.15 and IV. 12.17) the condition M = 0 is 
also necessary for symplecticity. For a publication see Abia & Sanz-Serna (1993, 
Theorem 5.1), where this proof has been elaborated and adapted to a more general 
setting. 
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Remarks, a) Explicit Runge-Kutta methods are never symplectic (Ex. 1). 

b) Equations (16.11) imply a substantial simplification of the order conditions 
(Sanz-Serna & Abia 1991). We shall return to this when treating partitioned meth¬ 
ods (see (16.40)). 

c) An important tool for the construction of symplectic methods is the W- 
transformation (see Section IV.5, especially Theorem IV.5.6). As can be seen from 
formula (IV. 12.10), the method under consideration is symplectic if and only if the 
matrix X is skew-symmetric (with the exception of x 11 — 1/2). Sun Geng ( 
1992) constructed several new classes of symplectic Runge-Kutta methods. One of 
his methods, based on Radau quadrature, is given in Table 16.1. 

d) An inspection of Table IV.5.14 shows that all Radau IA, Radau IIA, Lo- 
batto III A (in particular the trapezoidal rule), and Lobatto IIIC methods are not 
symplectic. 


Table 16.1. %)ls symplectic Radau method of order 5 


4-C6 

16-^6 

328 - 167^6 

-2 + 3^6 

10 

72 

1800 

450 

4 + a/6 

328 +167 a/6 

16 + a/6 

—2 — 3 a /6 

10 

1800 

72 

450 

1 

85 - 10y/6 

85 + 10V6 

1 

180 

180 

18 


16 — a/6 

16 + a/6 

1 


36 

36 

9 


Preservation of the Hamiltonian and of first integrals. In Exercise 5 of Sec¬ 
tion 1.6 we have seen that the Hamiltonian H(p , q ) is a first integral of the system 
(16.1). This means that every solution p(t),q(t) of (16.1) satisfies H (p(f), q(t)) = 
Const . The numerical solution of a symplectic integrator does not share this prop¬ 
erty in general (see Fig. 16.2). However, we will show that every quadratic first 
integral will be preserved. 

Denote y = (p, q) and let G be a symmetric 2 n x 2 n matrix. We suppose that 
the quadratic functional 

(y,y) G ■=y T Gy 


is a first integral of the system (16.1). This means that 


{y,J 1 gradi?(j/)) G = 0 with 



(16.19) 


for all j/eR 2n . 
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Theorem 16.7 (Sanz-Sema 1988). A symplectic Runge-Kutta method (i.e., a 
method satisfying (16.11)) leaves all quadratic first integrals of the system (16.1) 
invariant, i.e., the numerical solution y n = (p n , q n ) satisfies 

< 2 / i , 2 / i)g = < 2 / o . 2 / o)g (16.20) 

for all symmetric matrices G satisfying (16.19). 


Proof (Cooper 1987). The Runge-Kutta method (7.7) applied to problem (16.1) is 
given by 


vi = y 0 + Yl b i k i’ Y i = yo + J2 a ij k i’ 

i 3 


(16.21) 


ki = J - 1 grad H(YJ. 


As in the proof of Theorem 16.6 (see also Theorem IV. 12.4) we obtain 

< 2 / 1 . Vi)g - < 2 / 0 . 2/0 ><? = 2h Yl h i( Y i' k ^G ~ h2 J2 > k j)c • 

i i,j 

The first term on the right-hand side vanishes by (16.19) and the second one by 
(16.11). □ 


An Example from Galactic Dynamics 


Always majestic, usually spectacularly beautiful, galaxies 
are ... (Binney & Tremaine 1987) 

While the theoretical meaning of symplecticity of numerical methods is clear, its 
importance for practical computations is less easy to understand. Numerous numer¬ 
ical experiments have shown that symplectic methods, in a fixed step size mode, 
show an excellent behaviour for long-scale scientific computations of Hamiltonian 
systems. We shall demonstrate this on the following example chosen from galactic 
dynamics and give a theoretical justification later in this section. However, Calvo & 
Sanz-Serna (1992c) have made the interesting discovery that variable step size im¬ 
plementation can destroy the advantages of symplectic methods. In order to illus¬ 
trate this phenomenon we shall include in our computations violent step changes; 
one with a random number generator and one with the step size changing in func¬ 
tion of the solution position. 

A galaxy is a set of N stars which are mutually attracted by Newton’s law. 
A relatively easy way to study them is to perform a long-scale computation of the 
orbit of one of its stars in the potential formed by the N — 1 remaining ones (see 
Binney & Tremaine 1987, Chapter 3); this potential is assumed to perform a uni¬ 
form rotation with time, but not to change otherwise. The potential is determined 
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Fig. 16.3. Galactic orbit 


by Poisson’s differential equation Ay = AGirg, where g is the density distribution 
of the galaxy, and real-life potential-density pairs are difficult to obtain (e.g., de 
Zeeuw & Pfenniger 1988). A popular issue is to choose a simple formula for V 
in such a way that the resulting g corresponds to a reasonable galaxy, for example 
(Binney 1981, Binney & Tremaine 1987, p. 45f, Pfenniger 1990) 

V = A ln(c+^ + |j + ^-). (16.22) 

The Lagrangian for a coordinate system rotating with angular velocity Q becomes 

r. = 1 ((.i- - fly) 2 + (y + fix) 2 + i 2 ) - V (x, y, z). (16.23) 

This gives with the coordinates (see (1.6.23)) 

dC . 0 dC dC . 

p 'fdi = x ~ ny ’ P2 = ^ =y+fix ’ P3= di = z ' 

( 1 \ x * 7 2 = Vi $3 ^5 

the Hamiltonian 

H =p 1 q 1 +p 2 q 2 +p 3 q 3 ~C (16.24) 

= 2 ( p i +p 2 +p 3) +^ l {PiQ2-P2 , h) + A ^ + |f + ||) • 

We choose the parameters and initial values as 

a = 1.25, 6=1, c = 0.75, ,4 = 1, C = 1, 12 = 0.25, 

<7i(0) = 2.5, q 2 ( 0) = 0, r? 3 (0) = 0, Pl (0) = 0, p 3 (0)=0.2, 


(16.25) 
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and take for p 2 (0) the larger of the roots for which H = 2. Our star then sets 
out for its voyage through the galaxy, the orbit is represented in Fig. 16.3 for 0 < 
t < 15000. We are interested in its Poincare sections with the half-plane q 2 = 0, 
q x > 0, q 2 > 0 for 0 < t< 1000000. These consist, for the exact solution, in 47101 
cut points which are presented in Fig. 16.61. These points were computed with the 
(non-symplectic) code DOP853 with Tol = 10 -17 in quadruple precision on a VAX 
8700 computer. 

Fig. 16.4, Fig. 16.5, and Fig. 16.6 present the obtained numerical results for the 
methods and step sizes summarized in Table 16.2. 


Table 16.2. Methods for numerical experiments 


item 

method 

order 

h 

points 

t < 1000000 

impl. 

symplec. 

symmet. 

a) 

Gauss 

6 

1/5 

47093 

yes 

yes 

yes 

b) 

” 

” 

2/5 

46852 

” 

” 

” 

c) 

Gauss 

6 

random 

46717 

yes 

yes 

yes 

d) 

Gauss 

6 

partially 

halved 

46576 

yes 

yes 

yes 

e) 

Radau 

5 

1/10 

46597 

yes 

no 

no 

f) 

” 

” 

1/5 

46266 

” 

” 

” 

g) 

RK44 

4 

1/40 

47004 

no 

no 

no 

h) 

” 

” 

1/10 

46192 

” 

” 

” 

i) 

Lobatto 

6 

1/5 

47091 

yes 

no 

yes 

j) 

” 

” 

2/5 

46839 

” 

” 

” 

k) 

Sun Geng 

5 

1/5 

47092 

yes 

yes 

no 

1) 

exact 

- 

- 

47101 

- 

- 

- 


Remarks. 


ad a): the Gauss6 method (Kuntzmann & Butcher method based on Gaussian 
quadrature with 5 = 3 and p = 6, see Table 7.4) for h = 1/5 is nearly 
identical to the exact solution; 


ad b): Gauss6 for h — 2/5 is much better than Gauss6 with random or partially 
halved step sizes (see item (c) and (d)) where h <2/5. 

ad c): h was chosen at random uniformly distributed on (0, 2/5); 
ad d): h was chosen “partially halved” in the sense that 

2/5 if q x > 0, 

1/5 if q x < 0. 

This produced the worst result for the 6 th order Gauss method. We thus 


h- 













Fig. 16.5. Poincare cuts for 0 < t < 1000000; methods (e)-(h) 
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Fig. 16.6. Poincare cuts for 0 < t < 1000000; methods (i)-(l) 
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see that symplectic and symmetric methods compensate on the way back 
the errors committed on the outward journey. 

ad e), f): Radau5 (method of Ehle based on Radau quadrature with 5 = 3 and 
p = 5, see Table 7.7) is here not at all satisfactory; 

ad g): The explicit method RK44 (Runge-Kutta method with s = p = 4, see Table 
1.2, left) is evidently much faster than the implicit methods, even with a 
smaller step size; 

ad h): With increasing step size RK44 deteriorates drastically; 

ad i): this is a non-symplectic but symmetric collocation method based on Lo- 
batto quadrature with s = 4 of order 6 (see Table IV.5.8); its good perfor¬ 
mance on this nonlinear Hamiltonian problem is astonishing; 

ad j): with increasing h Lobatto6 is less satisfactory (see also Fig. 16.7); 

ad k): this is the symplectic non-symmetric method based on Radau quadrature 
of order 5 due to Sun Geng *$$)((Table 16.1). 

The preservation of the Hamiltonian (correct value H = 2) during the compu¬ 
tation for 0 < t < 1000000 is shown in Fig. 16.7. While the errors for the symplec¬ 
tic and symmetric methods in constant step size mode remain bounded, random h 
(case c) results in a sort of Brownian motion, and the nonsymplectic methods as 
well as Gauss6 with partially halved step size result in permanent deterioration. 



Fig. 16.7. Evolution of the Hamiltonian 
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Partitioned Runge-Kutta Methods 

The fact that the system (16.1) possesses a natural partitioning suggests the use of 
partitioned Runge-Kutta methods as discussed in Section 11.15. The main interest 
of such methods is for separable Hamiltonians where it is possible to obtain explicit 
symplectic methods. 

A partitioned Runge-Kutta method for system (16.1) is defined by 


P i=P0 + h J2 a ij k J 

j 

Qi = Qo + h ^2' d ij e j 

j 

(16.26a) 

Pi - Po ‘ h J2 b i k i 

q 1 mq 0 + h Y2 b i £ i 

(16.26b) 

k l = ~^( P i,Qi) 


(16.26c) 


where b il a i ■ and b i ffi i - represent two different Runge-Kutta schemes. 

Theorem 16.10 (Sanz-Serna 1992b, Suris 1990). a) If the coefficients of (16.26) 
satisfy 

&* = &*, i = l,...,s (16.27) 

bfiij + b j a ji ~ b i b j = i,j = l,..., s (16.28) 

then the method (16.26) is symplectic. 

b) If the Hamiltonian is separable (i.e., H (jp, q) = T(p) + U (q)) then the con¬ 
dition (16.28) alone implies symplecticity of the method. 

Proof. Following the lines of the proof of Theorem 16.6 we obtain 

dpi A dq( - dp J 0 /\dqi = hJ2 b i dP i A M i + h J2 b i dk i A dQ i 

i i 

- h 2 + b J a ji - W dk i A d£ l 

id 

instead of (16.17). The last term vanishes by (16.28). If b i = 6- for all 
plecticity of the method follows from (16.18). If the Hamiltonian is separable (the 
mixed derivatives d 2 H/dq J dp L and d 2 H / dp J dq L are not present in (16.15c,d)) 
then each of the two terms in (16.18) vanishes separately and the method is sym¬ 
plectic without imposing (16.27). □ 


(16.29) 

i, sym- 


Remark. If (16.28) is satisfied and if the Hamiltonian is separable, it can be as¬ 
sumed without loss of generality that 

bi 7^ 0; b i 0 


for all i. 


(16.30) 
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Indeed, the stage values P i (for i with b i = 0) and Q- (for j with bj =0) don’t 
influence the numerical solution (p 1 ,q 1 ) and can be removed from the scheme. 
Notice however that in the resulting scheme the number of stages P i may be dif¬ 
ferent from that of Q -. 

Explicit methods for separable Hamiltonians. Let the Hamiltonian be of the 
form H(p , q) = T(p) + U(q) and consider a partitioned Runge-Kutta method sat¬ 
isfying 

a- ■ =0 for i < j (diagonally implicit) 

(16.31) 

a- = 0 for i < j (explicit). 


Since dH/dq depends only on q , the method (16.26) is explicit for such a choice 
of coefficients. Under the assumption (16.30), the symplecticity condition (16.28) 
then becomes 


a ij = bj for i > j, ?i tJ = bj for i > j, 

so that the method (16.26) is characterized by the two schemes 


bi 

bi b 2 
b\ b 2 b3 


0 

h 0 
bi b 2 0 


bi b 2 ••• b s -1 b s 
bi b 2 ••• b s -1 b s 


bi b 2 • • • b s -1 0 
bi b 2 • • • b s -1 b s 


(16.32) 


(16.33) 


If we admit the cases b 1 = 0 and/or b s = 0, it can be shown (Exercise 6) that this 
scheme already represents the most general method (16.26) which is symplectic 
and explicit. We denote this scheme by 

b : ^ b 2 ... b s 

b: b x b 2 ... b s . 

This method is particularly easy to implement: 

p 0 =p 0 ’ Qi = % 

for i := 1 to s do 

P t = P._j - hb i dU/dq(Q i ) 

Q i+ i =Qi + hb i dT/dp(P i ) 

Pi = P s ,<hm Q s +1 

Special case 5 = 1. The combination of the implicit Euler method ( b 1 = 1) with the 
explicit Euler method (b 1 = 1) gives the following symplectic method of order 1: 
dU , x BT 


(16.34) 


(16.35) 
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By interchanging the roles of p and q we obtain the method 
dT dU 

<h=Qo + h -g-(,Po)’ Pi=Po~ h -g^( < h) (16.36b) 

which is also symplectic. Methods (16.36a) and (16.36b) are mutually adjoint (see 
Section II.8). 

Construction of higher order methods. The order conditions for general par¬ 
titioned Runge-Kutta methods applied to general problems (15.2) are derived in 
Section 11.15 (Theorem 15.9). Let us here discuss how these conditions simplify in 
our special situation. 

A) We consider the system (16.1) with separable Hamiltonian. In the notation 
of Section 11.15 this means that / a (?/ a ,%) depends only on y b and 
depends only on y a . Therefore, many elementary differentials vanish and only P- 
trees whose meagre and fat vertices alternate in each branch have to be considered. 
This is a considerable reduction of the order conditions. 



B) As observed by Abia & Sanz-Serna (1993) the condition (16.28) acts as a sim¬ 
plifying assumption. Indeed, multiplying (16.28) by <f>-(t) • &j(u) (where t = 
a [t 1: ..., t m \ G TP a , u = b \u lf , .., Uj] G TP b ) and summing up over all i and 
j yields 

=°- ( 16 - 37 > 

i 3 i 3 

Here we have used the notation of Butcher (1987) 


t ’ u — a [^1 ^m'> i 

illustrated in Fig. 16.8. Since 
1 

■ + 


U-t= 


= 0 


7 (t-u) 7 (u-t) 7 (t) j(u) 


(16.38) 

(16.39) 


(this relation follows from (16.37) by inserting the coefficients of a symplectic 
Runge-Kutta method of sufficiently high order, e.g., a Gauss method) we obtain 
the following fact: 
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let g(t) + g(u) = p and assume that all order conditions for P-trees of order 
< p are satisfied, then 

« i (i6 - 40 » 

From Fig. 16.8 we see that the P-trees t • it and it • t have the same geometrical 
structure. They differ only in the position of the root. Repeated application of this 
property implies that of all P-trees with identical geometrical structure only one has 
to be considered. 


A method of order 3 (Ruth 1983). The above reductions leave five order conditions 
for a method of order 3 which, for 5 = 3, are the following: 

b 1 + b 2 + b 3 = 1, b 1 + b 2 + b 3 = 1, b 2 b 1 + b 3 (b 1 + b 2 ) — 1/2, 

b 2 b\ + b 3 (b 1 + b 2 ) 2 = 1/3, + b 2 (b 1 + 6 2 ) 2 + 6 3 (6 1 + b 2 + & 3 ) 2 = 1/3. 


This nonlinear system possesses many solutions. A particularly simple solution, 
proposed by Ruth (1983), is 


b: 7/24 3/4 -1/24 

b: 2/3 -2/3 1. 


(16.41) 


Concatenation of a method with its adjoint. The adjoint method of (16.26) is ob¬ 
tained by replacing h by — h and by exchanging the roles of p 0 , q 0 and p x , q x (see 
Section II.8). This results in a partitioned Runge-Kutta method with coefficients 
(compare Theorem 8.3) 


a ij ^s+1— j a s+l—i,s+l—j'> ^s+1—z’ 

O'ij ^s+1— j ^i ^s+1—z* 

For the adjoint of (16.33) the first method is explicit and the second one is diag¬ 
onally implicit, but otherwise it has the same structure. Adding dummy stages, it 
becomes of the form (16.33) with coefficients 


b* : 06, 6, , ... 6, 

b *: b s 6 s _i ••• 0. 


(16.42) 


The following idea of Sanz-Serna (1992b) allows one to improve a method of odd 
order p: one considers the composition of method (16.33) (step size hj 2) with its 
adjoint (again with step size hj 2). The resulting method, which is represented by 
the coefficients 


&i/2 b 2 /2 ... 6 s _ 1 /2 6 s /2 6 s /2 6 s _ 1 /2 ... 6^2 

6 x /2 6 2 /2 ... 6,_i/2 6, 6,^/2 ... bj2 0, 

is symmetric and therefore has an even order which is > p +1. Concatenating 
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Ruth’s method (16.41) with its adjoint yields the fourth order method 

b : 7/48 3/8 -1/48 -1/48 3/8 7/48 

b: 1/3 -1/3 1 -1/3 1/3 0. 


(16.43) 


Symplectic Nystrom Methods 


A frequent special case of a separable Hamiltonian 77 (p, q)=T(p) + U ( q ) is when 

T(p) is a quadratic functional T(p) = p T Mp/2 (with M a constant symmetric 

matrix). In this situation the Hamiltonian system becomes 

dU , , . ^ 

q = Mp , 

which is equivalent to the second order equation 

q=-M^-{q). (16.44) 

It is therefore natural to consider Nystrom methods (Section 11.14) which for the 
system (16.44) are given by 

_^ QJJ 

Qi = %+ c Mo + h 2 Yl% k 'v K = ~ M ~dq^4* 

3 

<h = q 0 + h% + h 2 J2\K’ <ii = % + hJ2 b iK- 

i i 


Replacing the variable q by Mp and k[ by M£ i9 this method reads 

Qi = % + c ihMp 0 + h 2 J2 % M £j, 


3 = 1 


dU 


<h = % + hMp 0 + h 2 J _] b i M£ i , Pl =p 0 + hY^ b/i- 




i =1 


(16.45) 


Theorem 16.11 (Suris 1989). Consider the system (16.44) where M is a symmetric 
matrix. Then, the s-stage Nystrom method (16.45) is symplectic if the following 
two conditions are satisfied: 

b i — b i (l — c i ) : i = l,...,s (16.46a) 

b i ( b j ~ ) = bj fa - aji), i,j = l,...,s. (16.46b) 

Proof (Okunbor & Skeel 1992). As in the proof of Theorem 16.6 we differentiate 
the formulas (16.45) and compute 

dp( A dq( — dpQ A dq$ 

= hJ2 b i dt l A dc to + h 52 M JK dp o A dp o 

i K 


(16.47) 
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+ h2 J2 b iH M JK M i A d Po + E ^ E M JK dpi A d£f 

i K i K 

+ h 3 '£b i b j y £M JK d£tAd£?. 

i,j K 

Next we eliminate dq^ with the help of the differentiated equation of Q •, sum over 
all J and so obtain 

n n 

E dpi A dq( - E dpi a dq J 0 

J =1 J =1 

=^E 6 iE^ Ad( 2/ + h J2 M JK d pi Ad Po 

i J J,K 

+ h 2 E(fri - K - b iCi ) E d£i A dp£ 

i J,K 

+ h 3 ~ b j b i ~ b i d ij + E M JK M i A • 

i<j J,K 

The last two terms disappear by (16.46) whereas the first two terms vanish due to 
the symmetry of M and of the second derivatives of J7(</). □ 


We have already encountered condition (16.46a) in Lemma 14.13. There, it 
was used as a simplifying assumption. It implies that only the order conditions for 
q x have to be considered. 

For Nystrom methods satisfying both conditions of (16.46), one can assume 
without loss of generality that 

b i ^ 0 for i = l,...,s. (16.48) 

Let I = {i | ^ = 0}, then b i = 0 for i E I and a- = 0 for i 0 /, j E / . Hence, 
the stage values Q- (z E /) don’t influence the numerical result (p l5 g x ) and can 
be removed from the scheme. 

Explicit methods. Our main interest is in methods which satisfy 

a- =0 for i < j. (16.49) 

Under the assumption (16.48) the condition (16.46) then implies that the remaining 
coefficients are given by 

d ij =b A c i~ c ^ for *>i- (16.50) 

In this situation we may also suppose that 

q ^ c i _ 1 for i = 2,3,...,s, 

because equal consecutive c- lead (via condition (16.50)) to equal stage values Q •. 
Therefore the method is equivalent to one with a smaller number of stages. 
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The particular form of the coefficients a- allows the following simple imple¬ 
mentation (Okunbor & Skeel 1992b) 

Qo = %’ P 0 = Po 

for i := 1 to 5 do 

Qi~ Qi-i +H c i (with c 0 = 0) (16.51) 

P i =P i -i-hb i dU/dq(Q i ) 
q 1 =Q s + h,{l-c s ) M P s , p 1 = P s . 

Special case 5 = 1. Putting b x = 1 ( c 1 is a free parameter) yields a symplectic, 
explicit Nystrom method of order 1. For the choice c x = 1 /2 it has order 2. 


Special case 5 = 3. To obtain order 3, four order conditions have to be satisfied 
(see Table 14.3). The first three mean that (fr-, c-) is a quadrature formula of order 
3. They allow us to express b 3 , b 2 , b 3 in terms of c x , c 2 , c 3 . The last condition then 
becomes (Okunbor & Skeel 1992b) 

1 + 24 ( C 1 - \) ( c 2 - \) + 24 ( c 2 - C 1) ( c 3 - C 1) ( c 3 - c 2 ) (16.52) 

+ 144(c x - -) (c 2 - -) (c 3 - -) (c 1+ c 3 - c 2 - -) = 0. 


We thus get a two-parameter family of third order methods. Okunbor & Skeel 
(1992b) suggest taking 


c 


2 — 


1 

2 ’ 



(16.53) 


(the real root of 12 c 1 (2c 1 — l) 2 = 1). This method is symmetric and thus of order 
4. Another 3-stage method of order 4 has been found by Qin Meng-Zhao & Zhu 
Wen-jie (1991). 


Higher order methods. For the construction of methods of order > 4 it is worth¬ 
while to investigate the effect of the condition (16.46b) on the order conditions. 
As for partitioned Runge-Kutta methods one can show that SN-trees with the same 
geometrical structure lead to equivalent order conditions. For details we refer to 
Calvo & Sanz-Serna (1992). With the notation of Table 14.3, the SN-trees t 6 and 
t 7 as well as the pairs f 9 , t 12 and £ 10 , t 13 give rise to equivalent order conditions. 
Consequently, for order 5, one has to consider 10 conditions. Okunbor & Skeel 
(1992c) present explicit, symplectic Nystrom methods of orders 5 and 6 with 5 
and 7 stages, respectively. A 7th order method is given by Calvo & Sanz-Serna 
(1992b). 
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Conservation of the Hamiltonian; Backward Analysis 

The differential equation actually solved by the difference scheme 
will be called the modified equation. 

(Warming & Hyett 1974, p. 161) 

The wrong solution of the right equation; the right solution of the 
wrong equation. (Feng Kang, Beijing Sept. 1, 1992) 


We have observed above (Example 16.2 and Fig. 16.6) that for the numerical so¬ 
lution of symplectic methods the Hamiltonian H remained between fixed bounds 
over any long-term integration, i.e., so-called secular changes of H were absent. 
Following several authors (Yoshida 1993, Sanz-Serna 1992, Feng Kang 1991b) this 
phenomenon is explained by interpreting the numerical solution as the exact solu¬ 
tion of a perturbed Hamiltonian system , which is obtained as the formal expansion 
(16.56) in powers of h. The exact conservation of the perturbed Hamiltonian H 
then involves the quasi-periodic behaviour of H along the computed points. This 
resembles Wilkinson’s famous idea of backward error analysis in linear algebra 
and, in the case of differential equations, seems to go back to Warming & Hyett 
(1974). We demonstrate this idea for the symplectic Euler method (see (16.36b)) 


lh =P 0 -hH q (p 0 ,( h ) 
q 1 =q 0 + hH p (p 0 ,q 1 ) 

which, when expanded around the point (p 0 , q Q ), gives 

h 3 

Pi=Po- hH q - h 2 H qq H p - R qqq H p H p - h z H qq H pq H p -... 

* po,qo 

h 3 

<h=% + hH + h 2 HH + —H HH + h z HHH +... 

& po,qo 

(16.54’) 

In the case of non-scalar equations the p ’s and q ’s must here be equipped with var¬ 
ious summation indices. We suppress these in the sequel for the sake of simplicity 
and think of scalar systems only. The exact solution of a perturbed Hamiltonian 


P=-H q (p, q ) 

Q = H p (p,q) 

has a Taylor expansion analogous to Theorem 2.6 as follows 

~ h 2 ~ ~ ~ \ 

Pl=Po~ hH q + ~2 { H qp H q ~ H qq H p) + • • • 

~ h 2 / — ~ 

Qi = <7o + hH p + — \-H pp H q + H pq H p J + ... . 

We now set 

H = H + hH W + h 2 H (2) + h 3 H^ + ... 


(16.55) 


(16.56) 


with unknown functions ..., insert this into (16.55) and compare the 
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resulting formulas with (16.54’). Then the comparison of the h 2 terms gives 


H^ = 1 -H qq H p + 1 -H qp H q , 


H\P = \H pp H q + \H pq H p 


which by miracle (the “miracle” is in fact a consequence of the symplecticity of 
method (16.54)) allow the common primitive 


The h 3 terms lead to 


.#( 2) 


12 


H^ = -H p H q . 


(H pp Hl + H qq Hl + m pq H p H q 


(16.56; 1) 


(16.56;2) 


and so on. 


Connection with the Campbell-Baker-Hausdorff formula. An elegant access to 
the expansion (16.56), which works for separable Hamiltonians iT(p, q) = T(p) + 
U(q ), has been given by Yoshida (1993). We interpret method (16.54) as compo¬ 
sition of the two symplectic maps 


m 


_ l Po 

% 


z= ( Po 
9i 


Sr 


z 1 = 


_ (Pi 
<h 


(16.57) 


(16.58) 


which consist, respectively, in solving exactly the Hamiltonian systems 

P = 0 , P=~U q (q) 

and 

Q = t p (p) Q = 0 

and apply some Lie theory. If we introduce for these equations the differential 
operators given by (13.2 ’) 


8^ 


D v <t 




the formulas (13.3) allow us to write the Taylor series of the map S T as 

oo 7 • 

E li 1 ■ 

~p D t2 

i =0 


(16.59) 


(16.60) 


If now F(z) is an arbitrary function of the solution z(t) = (p(t), q(t )) (left equa¬ 
tion of (16.58)), we find, as in (13.2), that 

F( Z y = D t F , F(z)" = D 2 t F ,... 


and (16.60) extends to (Grobner 1960) 


h} 


= D tF(z) 


i =0 


(16.60’). 


We now insert Su for F and insert for Sjj the formula analogous to (16.60) to 
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obtain for the composition (16.57) 

00 hi 00 hi 

z i = (pi> ?i ) d u 


i =0 


3= 0 

= exp {hD T ) exp (hD u )(p, q ) 


(16.61) 


'p=p o,q=qo 

But the product exp (hD T ) exp (hDjj) is not exp (hD T + hDjj ), as we have all 
learned in school, because the operators D T and D v do not commute. This is 
precisely the content of the famous Campbell-Baker-Hausdorff Formula (claimed 
in 1898 by J.E. Campbell and proved independently by Baker (1905) and in the 
“kleine Untersuchung” of Hausdorff (1906)) which states, for our problem, that 

exp {hD T ) exp (hDjj) = exp (hD) (16.62) 

where 


D —Drp + Djj + 


h 


[D Ti Djj\ + 


12 


{\D T , [D T , Djj]] + [Djj, [Djj, Dj]]) 


+ 24 ^ T ’ 


(16.63) 


and [D a , D b \ = D A D B — D B D A is the commutator. Equation (16.62) shows that 
the map (16.57) is the exact solution of the differential equation corresponding to 
the differential operator D . A straightforward calculation now shows: If 


8^ 

D 4 4/ = — A H - 


< 9 ^ < 9 ^ 

and D B ^ = -—B g + — B v (16.64) 


dp q dq p dp q dq p 

are differential operators corresponding to Hamiltonians A and B respectively, 
then 

d^ d^ 

[D a , D b }\ E- = D c 9 = C n + — C, 


dp 


dq 


where 


C — A p B q - A q B p . 


(16.65) 


A repeated application of (16.65) now allows us to obtain for all brackets in (16.63) 
a corresponding Hamiltonian which finally leads to 


H = T- 


■U+^T n U„ 


h 2 7,3 

— (T U 2 + U T 2 ) + —T U TU ■ 

PP^q 1 ^ qq P J \2 PP QQ P Q 


p q 1 12 v pp q 1 qq p 

which is the specialization of (16.56) to separable Hamiltonians. 


(16.66) 


Example 16.12 (Yoshida 1993). For the mathematical pendulum 

2 

H(p,q) = j-cosq (16.67) 

series (16.66) becomes 

H — —— cos q-\- —psinq+ — (sin 2 q-\-p 2 cos q) + —p cos q sing + 0(h 4 ). 

Z z t z jl z 

(16.68) 
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Fig. 16.9 presents for various step sizes h and for various initial points (p 0 =0, %— 
—1.5; Pq = 0, q () = —2.5; p 0 = 1.5, q 0 = —7r; p 0 = 2.5, q 0 = —tt) the numerically 
computed points for method (16.54) compared to the contour lines of H — Const 
given by the terms up to order h 3 in (16.68). The excellent agreement of the results 
with theory for h < 0.6 leaves nothing to be desired, while for h beyond 0.9 the 
dynamics of the numerical method turns rapidly into chaotic behaviour. 



Fig. 16.9. Symplectic method compared to perturbed Hamiltonian 
(• ... indicate the initial positions) 


Remark. For much research, especially in the beginning of the “symplectic era”, the 
central role for the construction of canonical difference schemes is played by the 
Hamilton-Jacobi theory and generating functions. For this, the reader may consult 
the papers Feng Kang (1986), Feng Kang, Wu Hua-mo, Qin Meng-zhao & Wang 
Dao-liu (1989), Channell & Scovel (1990) and Miesbach & Pesch (1992). Many 
additional numerical experiments can be found in Channell & Scovel (1990), Feng 
Kang (1991), and Pullin & Saffman (1991). 
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Exercises 

1. Show that explicit Runge-Kutta methods are never symplectic. 

Hint. Compute the diagonal elements of M. 

2. Study the existence and uniqueness of the numerical solution for the implicit 
mid-point rule when applied to the Hamiltonian system 

p=-q 2 , q = p. 

Show that the method possesses no solution at all for h 2 q 0 + h 3 p 0 /2 < — 1 
and two solutions for h 2 q 0 + h 3 p 0 /2 > — 1 (h ^ 0). Only one of the solutions 
tends to (p 0 , q 0 ) for h —> 0. 

3. A Runge-Kutta method is called linearly symplectic if it is symplectic for all 
linear Hamiltonian systems 

y = J- 1 Cy 

(J is given in (16.19) and C is a symmetric matrix). Prove (Feng Kang 1985) 
that a Runge-Kutta method is linearly symplectic if and only if its stability 
function satisfies 


R(-z)R(z)m for all z e C. (16.69) 

Hint. For the definition of the stability function see Section IV.2 of Volume II. 
Then by Theorem 1.14.14, linear symplecticity is equivalent to 

R{hJ~ 1 C) T JR(hJ~ l C) = J. 

Furthermore, the matrix B := J~ X C is seen to verify B T J = —JB and hence 
also ( B k ) T J = J(—B) k for k = 0,1, 2, — This implies that 

R{hJ~ 1 C) T J = JRi-hJ-'C). 

4. Prove that the stability function of a symmetric Runge-Kutta method satisfies 
(16.69). 

5. Compute all quadratic first integrals of the Hamiltonian system (16.4). 

6. For a separable Hamiltonian consider the method (16.26) where a XJ = 0 for 
i<j, a - = 0 for i < j and for every i either a • • = 0 or a • • = 0. If the method 
satisfies (16.28) then it is equivalent to one given by scheme (16.33). 

Hint. Remove first all stages which don’t influence the numerical result (see 
the remark after Theorem 16.10). Then deduce from (16.28) relations similar 
to (16.32). Finally, remove identical stages and add, if necessary, a dummy 
stage in order that both methods have the same number of stages. 
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7. (Lasagni 1990). Characterize symplecticity for multi-derivative Runge-Kutta 
methods. Show that the s -stage q -derivative method of Definition 13.1 is sym- 
plectic if its coefficients satisfy 

jWjH _ fr( r ) a ( m ) _ ^( m ) a ( r ) _ / b[ r+rn>) if i = j and r + ra < q, 
i j 1 ij j ji \ 0 otherwise. 

(16.70) 

Hint. Denote fc( r ) = D r H p , £( r ) = D r H q , where D H is the differential operator 
as in (16.59) and (16.64), so that the exact solution of (16.1) is given by 

p(x 0 +h) =Po + J2~^ fc(r) ( p o. 9o), 9(^0+*) = % + %)■ 

r>1 r>l 

Then deduce from the symplecticity of the exact solution that 

-(dp/\M (e) +dk (e) Adq)+ V dfcM A <K< ro > = 0. (16.71) 

£>! v ' ^ r! m! 

r+ra=£> 

This, together with a modification of the proof of Theorem 16.6, allows us to 
obtain the desired result. 

8. (Yoshida 1990, QinMeng-Zhao & Zhu Wen-Jie 1992). Let y 1 = ^ h (y 0 ) denote 
a symmetric numerical scheme of order p = 2k . Prove that the composed 
method 

^c 1 h°V’c 2 h°V’c 1 h 

is symmetric and has order p + 2 if 

2c x +c 2 = 1, 2c\ k+1 + cf +1 = 0. (16.72) 

Hence there exist, for separable Hamiltonians, explicit symplectic partitioned 
methods of arbitrarily high order. 

Hint. Proceed as for (4. l)-(4.2) and use Theorem 8.10 (the order of a symmetric 
method is even). 

9. The Hamiltonian function (16.24) for the galactic problem is not separable. 
Nevertheless, both methods (16.36a) and (16.36b) can be applied explicitly. 
Explain. 
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Detailed studies of the real world impel us, albeit reluctantly, to 
take account of the fact that the rate of change of physical systems 
depends not only on their present state, but also on their past 
history. (Bellman & Cooke 1963) 


Delay differential equations are equations with “retarded arguments” or “time lags” 
such as 


y'(x) =f(x,y(x) 

),y(x-r)) 

(17.1) 

y'{x) =f(x,y(x) 

Uy( x — Ti), y(x — r 2 )) 

(17.2) 


or of even more general form. Here the derivative of the solutions depends also on 
its values at previous points. 

Time lags are present in many models of applied mathematics. They can also 
be the source of interesting mathematical phenomena such as instabilities, limit 
cycles, periodic behaviour. 


Existence 

For equations of the type (17.1) or (17.2), where the delay values x — r are bounded 
away from x by a positive constant, the question of existence is an easy matter: 
suppose that the solution is known, say 

y(x) = if{x) for x 0 — r<x<x 0 . 

Then y(x — r) is a known function of x for x 0 < x < x 0 +r and (17.1) becomes an 
ordinary differential equation, which can be treated by known existence theories. 
We then know y(x) for x 0 < x < x 0 + r and can compute the solution for x 0 + r < 
x<x 0 + 2r and so on. This “method of steps” then yields existence and uniqueness 
results for all x. For more details we recommend the books of Bellman & Cooke 
(1963) and Driver (1977, especially Chapter V). 

Example 1. We consider the equation 

y\x) = — y(x — 1), y(pc) = 1 for — 1 < x < 0. 


(17.3) 
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Proceeding as described above, we obtain 

y(x) = 1 — x for 0 < x < 1, 

(x-1) 2 

y(x) = 1 — x + -——- for 1 < x < 2, 

, N (x — l) 2 (x — 2) 3 

y(x) = 1 — x + -——-—- for 2 < x <3, etc. 

The solution is displayed in Fig. 17.1. We observe that despite the fact that the dif¬ 
ferential equation and the initial function are C °°, the solution has discontinuities 
in its derivatives. This results from the fact that the initial function does not satisfy 
the differential equation. With every time step r, however, these discontinuities 
are smoothed out more and more. 



Example 2. Our next example clearly illustrates the fact that the solutions of a 
delay equation depend on the entire history between x 0 — r and x 0 , and not only 
on the initial value: 

y\x) =-lA-y(x-l) (17.4) 

a) if{x) = 0.8 for — 1 < x < 0, 

b) (p{x) = 0.8 + x for — 1 < x < 0, 

c) ip(x) = 0.8 + 2x for — 1 < x < 0. 

The solutions are displayed in Fig. 17.2. An explanation for the oscillatory be¬ 
haviour of the solutions will be given below. 



Fig. 17.2. Solutions of (17.4) 





II. 17 Delay Differential Equations 341 


Constant Step Size Methods for Constant Delay 

If we apply the Runge-Kutta method (1.8) (or (7.7)) to a delay equation (17.1) we 
obtain 

a\ n) = y n + h Yl a ijf( x n + c j h > 9j n \y(x n + Cjh - r)) 

3 

y n +l=yn + h Yl b J f (■ X n + c j h ’ 9j n) ,y( X n + C j h ~ T ))- 

3 


But which values should we give to y(x n + c-h — r) ? If the delay is constant and 
satisfies r = kh for some integer k , the most natural idea is to use the back-values 
of the old solution 


g\ n) = y n + h Yl %f( x n + c j h - 9j n) , if) 

3 

(17.5a) 

y n +i =y n + h Yl b jf( x n + c j h . 9j n \i/j n) ) 

3 

(17.5b) 

where 

(„) / t P( x n + c j h ~ T ) if n <k 

lj ~ l 97~ k) if n > k. 

(17.5c) 

This can be interpreted as solving successively 


y\x) = f(x,y(x),ip(x-T)) 

(17.1a) 

for the interval [x 0 , x 0 + r\ , then 


y'(x) = f(x,y(x),z(x)) 
z'{x) = f(x-r, z(x), ip(x - 2r)) 

(17.1b) 


for the interval [x 0 + r, x 0 + 2r], then 
y'(x) = f(x,y(x),z(x)) 

z\x) = f(x-T,z(x),v(x)) (17.1c) 

v'(x) = f(x — 2r, v(x), (/^(x — 3r)) 

for the interval [x 0 + 2r, x 0 + 3r], and so on. This is the perfect numerical analog 
of the “method of steps” mentioned above. 

Theorem 17.1. If c % . a- . b ; - are the coefficients of a p-th order Runge-Kutta 
method, then (17.5) is convergent of order p. 

Proof The sequence (17.1a), (17.1b),... are ordinary differential equations nor¬ 
mally solved by a pth order Runge-Kutta method. Therefore the result follows 
immediately from Theorem 3.6. □ 
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Remark. For the collocation method based on Gaussian quadrature formula, Theo¬ 
rem 17.1 yields superconvergence in spite of the use of the low order approxima¬ 
tions 7 ^ of (17.5c). Bellen (1984) generalizes this result to the situation where 

t = t(x) and 7 ^ is the value of the collocation polynomial at x n + C-H — r(x n + 
Cj h ). He proves superconvergence if the grid-points are chosen such that every in¬ 
terval [x n _i, x n \ is mapped, by x — r(x ), into [Xj_^x-] for some j <n. 

Numerical Example. We have integrated the problem 

y'(x) = (1.4 - y(x - 1)) • y(x) 

(see (17.12) below) for 0 < x < 10 with initial values y(x) = 0 , — 1 < x < 0 , 
2/(0) = 0.1, and step sizes ft, = 1,1/2,1/4,1/8,..., 1/128 using Kutta’s methods 
of order 4 (Table 1.2, left). The absolute value of the global errors (and the solu¬ 
tion in grey) are presented in Fig. 17.3. The 4th order convergence can clearly be 
observed. The downward peaks are provoked by sign changes in the error. 


10- 3 


10- 6 


10- 9 


0123456789 10 

Fig. 17.3. Errors of RK44 with retarded stages (17.5) 



^ =1/128 


Variable Step Size Methods 

Although method (17.5) allows efficient and easy to code computations for simple 
problems with constant delays (such as all the examples of this section), it does not 
allow to change the step size arbitrarily, and an application to variable delay equa¬ 
tions is not straightforward. If complete flexibility is desired, we need a global 
approximation to the solution. Such global approximations are furnished by multi- 
step methods of Adams or BDF type (see Chapter III. 1) or the modern Runge-Kutta 
methods which are constructed together with a dense output. The code RETARD 
of the appendix is a modification of the code DOPRI5 (method of Dormand & 
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Prince in Table 5.2 with Shampine’s dense output; see (6.12), (6.13) and the sub¬ 
sequent discussion) in such a way that after every successful step of integration the 
coefficients of the continuous solution are written into memory. Back-values of the 
solution are then available by calling the function YLAG(I,X,PHI). For example, 
for problem (17.4) the subroutine FCN would read as 

F(l) = -1.4D0 * YLAG(1, X - 1.D0, PHI). 

As we have seen, the solutions possess discontinuities in the derivatives at several 
points, e.g. for (17.1) at x 0 + r, x 0 + 2r, x 0 + 3r,... etc. Therefore the code 
RETARD provides a possibility to match given points of discontinuities exactly 
(specify IWORK( 6 ) and WORK(ll),...) which improves precision and computa¬ 
tion time. 

Earlier Runge-Kutta codes for delay equations have been written by Oppelstrup 
(1976), Oberle & Pesch (1981) and Bellen & Zennaro (1985). Bock & Schloder 
(1981) exploited the natural dense output of multistep methods. 


Stability 

It can be observed from Fig. 17.1 and Fig. 17.2 that the solutions, after the initial 
phase, seem to tend to something like e ax cos (3{pc — 8) . We now try to determine 
a and f3 . We study the equation 

y\x) = \y(x)+ ny{x-l). (17.6) 

There is no loss of generality in supposing the delay r = 1, since any delay r^l 
can be reduced to r = 1 by a coordinate change. 

We search for a solution of the form 

y(x) = e lx where 7 = a + i(3. (17.7) 

Introducing this into (17.6) we obtain the following “characteristic equation” for 7 

7 — A — //e -7 = 0, (17.8) 

which, for y 7 ^ 0 , possesses an infinity of solutions: in fact, if I 7 1 becomes large, 
we obtain from (17.8), since A is fixed, that (ie~^ must be large too and 

7 ~ //e -7 . (17.8’) 

This implies that 7 = a + i/3 is close to the imaginary axis. Hence I7 1 « \(3\ and 
from (17.8’) 

\P\ « \fi\e~ a . 

Therefore the roots of (17.8) lie asymptotically on the curves — a = log \(3\ — 
log \/i \. Again from (17.8’), we have a root whenever the argument of y J e~ i f 3 is 
close to 7 t/2 (for [3 > 0 ), i.e. if 

(3 « ar g/i — ^ + 2k7T k = 1 , 2 ,... 
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There are thus two sequences of characteristic values which tend to infinity on log¬ 
arithmic curves left of the imaginary axis, with 2i r as asymptotic distance between 
two consecutive values. 

The “general solution” of (17.6) is thus a Fourier-like superposition of solu¬ 
tions of type (17.7) (Wright 1946, see also Bellman & Cooke 1963, Chapter 4). 
The larger —Re 7 is, the faster these solutions “die out” as x —> 00. The domi¬ 
nant solutions are thus (provided that the corresponding coefficients are not zero) 
those which correspond to the largest real part, i.e., those closest to the origin. 
For equations (17.3) and (17.4) the characteristic equations are 7 + e - ^ = 0 and 
7 + \Ae~i = 0 with solutions 7 = —0.31813 =b 1.33724z and 7 = —0.08170 ± 
1.51699i respectively, which explains nicely the behaviour of the asymptotic solu¬ 
tions of Fig. 17.1 and Fig. 17.2. 

Remark. For the case of matrix equations 

y'{x) = Ay(x ) + By(x - 1 ) 

where A and B are not simultaneously diagonizable, we set y{pc) = ve^ x where 
v 7 ^ 0 is a given vector. The equation now leads to 

jv = Av + Be~ 7 v^ 

which has a nontrivial solution if 

det( 7 J -A- Be ~ 7 ) = 0, (17.8”) 

the characteristic equation for the more general case. The shape of the solutions of 
(17.8”) is similar to those of (17.8), there are just r = rank (B) points in each strip 
of width 27 t instead of one. 

All solutions of (17.6) remain stable forx^oo if all characteristic roots of 
(17.8) remain in the negative half plane. This result follows either from the above 
expansion theorem or from the theory of Laplace transforms (e.g., Bellmann & 
Cooke (1963), Chapter 1), which, in fact, is closely related. 

In order to study the boundary of the stability domain, we search for (A, /i) 
values for which the first solution 7 crosses the imaginary axis, i.e. 7 = iO for 6 
real. If we insert this into (17.8), we obtain 

A = —fi for 6 = 0 (7 real) 

A = i6- iie~ ie for 9 ± 0 

or, by separating real and imaginary parts, 

. cos 6-6 6 

A= . a , b = — 

sin 6 sin 0 

valid for real A and /i. These paths are sketched in Fig. 17.4 and separate in the 
(A, fi) -plane the domains of stability and instability for the solutions of (17.6) (a 
result of Hayes 1950). 
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If we put 0 = 7t/ 2, we find that the solutions of y'(x ) = yy(x — 1) remain 
stable for 

-|<M<0 (17.9a) 

and are unstable for 

H<— ^ as well as /x > 0. (17.9b) 



Fig. 17.4. Domain of stability for y'(x ) = A y{x) + /xy(x — 1) 


An Example from Population Dynamics 


Lord Cherwell drew my attention to an equation, equivalent to (8) 
(here: (17.12)) with a = log 2, which he had encountered in his 
application of probability methods to the problem of distribution 
of primes. My thanks are due to him for thus introducing me to 
an interesting problem. (E.M. Wright 1945) 

We now demonstrate the phenomena discussed above and the power of our pro¬ 
grams on a couple of examples drawn from applications. For supplementary ap¬ 
plications of delay equations to all sorts of sciences, consult the impressive list in 
Driver (1977, p. 239-240). 

Let y(x) represent the population of a certain species, whose development as 
a function of time is to be studied. The simple model of infinite exponential growth 
y' = A y was soon replaced by the hypothesis that the growth rate A will decrease 
with increasing population y due to illness and lack of food and space. One then 
arrives at the model (Verhulst 1845, Pearl & Reed 1922) 

y\x) = k- ( a-y{x )) -y{x). 


(17.10) 




346 II. Runge-Kutta and Extrapolation Methods 


“Nous donnerons le nom logistique a la courbe caracterisee par 1’equation pre- 
cedente” (Verhulst). It can be solved by elementary functions (Exercise 1). All 
solutions with initial value y 0 > 0 tend asymptotically to a as x —> oc. If we 
assume the growth rate to depend on the population of the preceding generation, 
(17.10) becomes a delay equation (Cunningham 1954, Wright 1955, Kakutani & 
Markus 1958) 

y'( x ) = k- (a — y(x — t)) • y{x). (17.11) 

Introducing the new function z{x) = kry(rx) into (17.11) and again replacing 2 
by y and kar by a we obtain 

y'(x) = (a-y(x-l))-y(x). (17.12) 

This equation has an equilibrium point at y{x) = a. The substitution y{x) = a + 
z{pc) and linearization leads to the equation z'(x) = —az(x — 1), and condition 
(17.9) shows that this equilibrium point is locally stable if 0 < a < tt/2. Hence 
the characteristic equation, here 7 + ae - ^ = 0 , possesses two real solutions iff 
a < 1 /e = 0.368, which makes monotonic solutions possible; otherwise they are 
oscillatory. For a > tt/2 the equilibrium solution is unstable and gives rise to a 
periodic limit cycle. 



The solutions in Fig. 17.5 have been computed by the code RETARD of the 
appendix with subroutine FCN as 

F(l) = (A - YLAG(1, X - 1.D0, PHI)) * Y(l), A = 0.35, 0.5,1., 1.4, and 1.6. 
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Infectious Disease Modelling 

De tous ceux qui ont traite cette matiere, c’est sans contredit M. 
de la Condamine qui l’a fait avec plus de succes. II est deja venu 
a bout de persuader la meilleure partie du monde raisonnable de 
la grande utilite de 1’inoculation: quant aux autres, il serait inutile 
de vouloir employer la raison avec eux: puisqu’ils n’agissent pas 
par principes. II faut les conduire comme des enfants vers leur 
mieux ... (Daniel Bernoulli 1760) 

Daniel Bernoulli (“Docteur en medecine, Professeur de Physique en l’Universite de 
Bale, Associe etranger de T Academie des Sciences”) was the first to use differential 
calculus to model infectious diseases in his 1760 paper on smallpox vaccination. 
At the beginning of our century, mathematical modelling of epidemics gained new 
interest. This finally led to the classical model of Kermack & McKendrick (1927): 
let y 1 (x) measure the susceptible portion of the population, y 2 (x) the infected, 
and y 3 (x) the removed (e.g. immunized) one. It is then natural to assume that 
the number of newly infected people per time unit is proportional to the product 
y 1 (x)y 2 (x), just as in bimolecular chemical reactions (see Section 1.16). If we 
finally assume the number of newly removed persons to be proportional to the 
infected ones, we arrive at the model 

y[ = -y 1 y 2 i y2 = yi%-?/2> v'z = Vi (17.13) 

where we have taken for simplicity all rate constants equal to one. This system 
can be integrated by elementary methods (divide the first two equations and solve 
dy 2 jdy x — —1 + 1/^). The numerical solution with initial values ^(0) = 5, 
y 2 ( 0) = 0.1, 7/3(0) = 1 is painted in gray color in Fig. 17.6: an epidemic breaks 
out, everybody finally becomes “removed” and nothing further happens. 



Fig. 17.6. Periodic outbreak of disease, model (17.14) 

(in gray: Solution of Kermack - McKendrick model (17.13)) 


We arrive at a periodic outbreak of the disease, if we assume that immunized 
people become susceptible again, say after a fixed time r (r = 10). If we also 
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introduce an incubation period of, say, r 2 = 1, we arrive at the model 

y'\(x) = -y 1 (x)y 2 (x -1) + 2/2(2 - 10) 

2/2(2) = V\{ x )y 2 (x- 1) -y 2 (x) (17.14) 

y' 3 (x) = y 2 (x) - y 2 (x - 10) 

instead of (17.13). The solutions of (17.14), for the initial phases y ± (x) = 5, 
y 2 (x) = 0.1, y 3 (x) = 1 for x < 0, are shown in Fig. 17.6 and illustrate the pe¬ 
riodic outbreak of the disease. 


An Example from Enzyme Kinetics 


Our next example, more complicated than the preceding ones, is from enzyme 
kinetics (Okamoto & Hayashi 1984). Consider the following consecutive reactions 


I 



(17.15) 


where I is an exogenous substrate supply which is maintained constant and n 
molecules of the end product Y 4 inhibit co-operatively the reaction step of Y 1 —> Y 2 
as 


fc i 

l + a(y 4 (x)) n ' 


It is generally expected that the inhibitor molecule must be moved to the position 
of the regulatory enzyme by forces such as diffusion or active transport. Thus, 
we consider this time consuming process causing time-delay and we arrive at the 
model 

y[(x) = I-zy 1 (x) 

y' 2 {x) = z Vl {x) - y 2 {x) _1_ 

2/3(2) = 2/2(2)-2/3(2) 1 +0.0005(2/4(2: - 4 )) 3 

2/4(2) = 2/3(2)- 0 . 5 y 4 (a:) 

This system possesses an equilibrium at zy 1 = y 2 = y 3 = /, y 4 = 21 , y 1 = 1(1 + 
0 . 004 / 3 ) =: c 1 . When it is linearized in the neighbourhood of this equilibrium 
point, it becomes 

y[(x) = —c 1 y 1 (x) + c 2 y A (x — 4 ) 

2/2(2) = Ci2/i(2) - 2/2(2) - c 2 y A {x - 4 ) 

2/3(2) = 2/2(2) -2/3(2) 

2/4(2) = 2/3(2) - 0.52/4(2:) 


(17.17) 
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where c 2 = c 1 -1 3 ■ 0.006. By setting y(x) = v • e^ x we arrive at the characteristic 
equation (see (17.8”)), which becomes after some simplifications 

( Cl + 7)(1 + 7) 2 (0.5 + 7) + c 2 7e -47 = 0. (17.18) 

As in the paper of Okamoto & Hayashi, we put I = 10.5. Then (17.18) possesses 
one pair of complex solutions in C+ , namely 

7 = 0.04246 d= 0.47666i 

and the equilibrium solution is unstable (see Fig. 17.7). The period of the solution 
of the linearized equation is thus T = 27r/0.47666 = 13.18. The solutions then tend 
to a limit cycle of approximately the same period. 



Fig. 17.7. Solutions of the enzyme kinetics problem (17.16), / = 10.5. 
Initial values close to equilibrium position 


A Mathematical Model in Immunology 

We conclude our series of examples with Marchuk’s model (Marchuk 1975) for the 
struggle of viruses V ( t ), antibodies F(t) and plasma cells C(t) in the organism 
of a person infected by a viral disease. The equations are 

f=('*■-w 

dC 

— = d(m)h 3 F(t - r)V(t — r) — h b (C— 1) (17.19) 

dF 

— =h ^C-F)-h,FV : 

The first is a Volterra - Lotka like predator-prey equation. The second equation 
describes the creation of new plasma cells with time lag due to infection, in the 
absence of which the second term creates an equilibrium at (7=1. The third 
equation models the creation of antibodies from plasma cells (h 4 C) and their 





350 II. Runge-Kutta and Extrapolation Methods 



0 10 20 30 40 50 60 



decrease due to aging (—h 4 F) and binding with antigens (— h 8 FV ). The term 
£ (m), finally, is defined by 


= 


1 


(1 — m) 


10 

~9~ 


if 771 < 0.1 
if 0.1 < m < 1 


and expresses the fact that the creation of plasma cells slows down when the organ¬ 
ism is damaged by the viral infection. The relative characteristic m(t) of damaging 
is given by a fourth equation 


—— = haV — h 7 m 
dt 6 7 

where the first term expresses the damaging and the second recuperation. 

This model allows us, by changing the coefficients h l9 h 2 ,..., h 8 , to model 
all sorts of behaviour of stable health, unstable health, acute form of a disease, 
chronic form etc. See Chapter 2 of Marchuk (1983). In Fig. 17.8 we plot the 
solutions of this model for r = 0.5, h ± = 2, h 2 = 0.8, h 3 = 10 4 , h A = 0.17, 
h 5 = 0.5, h 7 = 0.12, h 8 = 8 and initial values V(t) = max(0,10 -6 + t) if t < 0, 
(7(0) = 1, F(t) = 1 if t < 0, m(0) = 0. In dependence of the value of h 6 (h 6 = 10 
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or h 6 = 300), we then observe either complete recovery (defined by V ( t) < 10 -16 ), 
or periodic outbreak of the disease due to damaging (m(t) becomes nearly 1). 


Integro-Differential Equations 

Often the hypothesis that a system depends on the time lagged solution at a speci¬ 
fied fixed value x — r is not very realistic, and one should rather suppose this de¬ 
pendence to be stretched out over a longer period of time. Then, instead of (17.1), 
we would have for example 

y\ x ) =f{x,y(x), f K(x,£,y(£))dA. (17.20) 

' JX—T ' 

The numerical treatment of these problems becomes much more expensive (see 
Brunner & van der Houwen (1986) for a study of various discretization methods). 
If K(x, £, y) is zero in the neighbourhood of the diagonal x = £, one can eventually 
use RETARD and call a quadrature routine for each function evaluation. 

Fortunately, many integro-differential equations can be reduced to ordinary or 
delay differential equations by introducing new variables for the integral function. 

Example (Volterra 1934). Consider the equation 

;/(.<•) = (e - ay(x) - J k(x-£)y(£) y(x) (17.21) 

for population dynamics, where the integral term represents a decrease of the re¬ 
production rate due to pollution. If now for example k(x) = c, we put 

y(£)d£ = v(x), y(x)=v\x) 

and obtain 

v"(x) = (s — av'(x) — cv(x)) • v'(x), 

an ordinary differential equation. 

The same method is possible for equations (17.20) with “degenerate kernel”; 
i.e., where 

m 

K(x,^,y) = Y^a i {x)b i (^,y). (17.22) 

If we insert this into (17.20) and put 

Vi{x)= [ (£,y(£)) d£, 


(17.23) 
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we obtain 

m 

y'{x) = f(x, y(x),J2 a i( x ) v i ( x )) 

i= 1 

v i ( x ) = b i V (a?)) - bi (x - t, y(x - r)) i = 

a system of delay differential equations. 


(17.20’) 


Exercises 


1. Compute the solution of the Verhulst & Pearl equation (17.10). 

2. Compute the equilibrium points of Marchuk’s equation (17.19) and study their 
stability. 

3. Assume that the kernel k(x) in Volterra’s equation (17.21) is given by 

k(x) = p{x)e~^ x 

where p(x) is some polynomial. Show that this problem can be transformed 
into an ordinary differential equation. 


4. Consider the integro-differential equation 

y'( x ) = f(^,y( x ), j K{ x ,€,y(£))d{). 


(17.24) 


a) For the degenerate kernel (17.22) problem (17.24) becomes equivalent to 
the ordinary differential equation 


m 

y'( x ) = f(x,y( x ),'52a j (x)v j {x)') 


3 = 1 


(17.25) 


v j( x ) = bj(x,y{ x )). 


b) Show that an application of an explicit (p th order) Runge-Kutta method to 
(17.25) yields the formulas (Pouzet 1963) 

s 

Vn +1 =y n + h ^2 b if( x n+ C A 9 ^> ^ ) 

1=1 
i— 1 

g\ n) =y n + h '^2 a ijf( x n + c j h i 9 ( j l \uf > ) 

3 = i 

i— 1 

^ = F n( X n + C i h ) + h J2 a ij K ( X n + C i h > X n + C j h > 


i=i 


(17.26) 
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where 

s 

F 0 (x) = 0, F n+1 (x) = F n {x) + h ^ \K{x, x n + c t h, gj n) ). 

i= 1 

c) If we apply method (17.26) to problem (17.24), where the kernel does not 
necessarily satisfy (17.22), we nevertheless have convergence of order p. 

Hint. Approximate the kernel by a degenerate one. 


5. (Zennaro 1986). For the delay equation (17.1) consider the method (17.5) 
where (17.5c) is replaced by 


f <p(x n + c-h — t) if n<k 

\q n -k( c j) if n > k. 


(17.5c’) 


Here q n ( 0 ) is the polynomial given by a continuous Runge-Kutta method (Sec¬ 
tion II. 6 ) 

Qn (0) = Vn + h b j ( 0 ) ■f ( X n + C j h ’ 9^, 7 ) • 

3 = 1 


a) Prove that the orthogonality conditions 

/»1 s 

/ 9 <I - 1 (^(t)J2b j (0)^ j (t)-9 e ^)de = 0 for q + g(t) < p 

(17.27) 

imply convergence of order p , if the underlying Runge-Kutta method is of 
order p for ordinary differential equations. 

Hint. Use the theory of B-series and the Grobner - Alekseev formula 
(1.14.18) of Section 1.14. 

b) If for a given Runge-Kutta method the polynomials b-(0) of degree < 
[(p-\- 1 )/ 2 ] are such that bj(0 ) = 0, b-{ 1 ) = bj and 

f 9i- 1 b J (9)d9=-b J (l-c q j ), q = 1,... ,[(p— l)/ 2 ], (17.28) 
J o Q 

then (17.27) is satisfied. In addition one has the order conditions 

= f or e{t)<[(p + i)/2]. 


c) Show that the conditions (17.28) admit unique polynomials 6 -(0) of de- 
gree [(p+ 1 )/ 2 ]. 


6 . Solve Volterra’s equation (17.21) with k(x) # c and compare the solution with 
the “pollution free” problem (17.10). Which population lives better, that with 
pollution, or that without? 



Chapter III. Multistep Methods 
and General Linear Methods 


This chapter is devoted to the study of multistep and general multi value methods. 
After retracing their historical development (Adams, Nystrom, Milne, BDF) we 
study in the subsequent sections the order, stability and convergence properties 
of these methods. Convergence is most elegantly set in the framework of one- 
step methods in higher dimensions. Sections III.5 and III.6 are devoted to variable 
step size and Nordsieck methods. We then discuss the various available codes 
and compare them on the numerical examples of Section II. 10 as well as on some 
equations of high dimension. Before closing the chapter with a section on special 
methods for second order equations, we discuss two highly theoretical subjects: 
one on general linear methods, including Runge-Kutta methods as well as multistep 
methods and many generalizations, and the other on the asymptotic expansion of 
the global error of such methods. 
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..., and my undertaking must have ended here, if I had depended 
upon my own resources. But at this point Professor J.C. Adams 
furnished me with a perfectly satisfactory method of calculating 
by quadratures the exact theoretical forms of drops of fluids from 
the Differential Equation of Laplace,... (F. Bashforth 1883) 


Another improvement of Euler’s method was considered even earlier than Runge- 
Kutta methods — the methods of Adams. These were devised by John Couch 
Adams in order to solve a problem of F. Bashforth, which occurred in an investiga¬ 
tion of capillary action. Both the problem and the numerical integration schemes 
are published in Bashforth (1883). The actual origin of these methods must date 
back to at least 1855, since in that year F. Bashforth made an application to the 
Royal Society for assistance from the Government grant. There he wrote: 44 ..., but 
I am indebted to Mr Adams for a method of treating the differential equation 


ddz 
du 2 


1 + 


dz 2 \ 3/2 
du 2 / 


+ 


1 dz 


u du 



1/2 


— 2 az = 


2 

V 


when put under the form 


b b . ^ 7 o z ^ z 

—I— sin cp = 2 2 cub — = 2 (3 —, 

q x b b 

which gives the theoretical form of the drop with an accuracy exceeding that of the 
most refined measurements.” 

In contrast to one-step methods, where the numerical solution is obtained solely 
from the differential equation and the initial value, the algorithm of Adams consists 
of two parts: firstly, a starting procedure which provides y 1 ,..., y k _ x (approxima¬ 
tions to the exact solution at the points x 0 + h ,..., x 0 + (k — l)h) and, secondly, 
a multistep formula to obtain an approximation to the exact solution y(x Q + kh ). 
This is then applied recursively, based on the numerical approximations of k suc¬ 
cessive steps, to compute y(x 0 + (k + l)h ), etc. 

There are several possibilities for obtaining the missing starting values. J.C. 
Adams actually computed them using the Taylor series expansion of the exact so¬ 
lution (as described in Section 1.8, see also Exercise 2). Another possibility is the 
use of any one-step method, e.g., a Runge-Kutta method (see Chapter II). It is also 
usual to start with low-order Adams methods and very small step sizes. 
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Explicit Adams Methods 

We now derive, following Adams, the first explicit multi step formulas. We in¬ 
troduce the notation x i = x 0 + ih for the grid points and suppose we know the 
numerical approximations y nl y n _n ..., y n _ k+1 to the exact solution y(x n ),. 
y{x n _ k+ 1 ) of the differential equation 

y' = f(x,y), y(x 0 )=y 0 . (1.1) 

Adams considers (1.1) in integrated form, 

Tt-\- 1 

y(x n+ i) = y(x n )+ f(t,y(t))dt. ( 1 . 2 ) 

Jx n 

On the right hand side of (1.2) there appears the unknown solution y(x) . But since 
the approximations y n _ k+l1 ..., y n are known, the values 

f i = f(x i ,y i ) for is n k • 1. n (1.3) 


are also available and it is natural to replace the function f(t, y(t)) in ( 1 . 2 ) by the 
interpolation polynomial through the points {(x i: /•) | i = n—k+ 1 ,..., n} (see 
Fig. 1.1). 



X n - k + 1 ’ * * X n - 1 X n X n+ 1 X n - k + 1 * * * X n - 1 X n + 1 

Fig. 1.1. Explicit Adams methods Fig. 1.2. Implicit Adams methods 



(1.5) 
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where the coefficients 7 ^ satisfy 



( 1 . 6 ) 


(see Table 1.1 for their numerical values). A simple recurrence relation for these 
coefficients will be derived below (formula (1.7)). 


Table 1.1. Coefficients for the explicit Adams methods 


3 

0 

1 

2 

3 

4 

5 

6 

7 

8 

7j 

1 

1 

5 

3 

251 

95 

19087 

5257 

1070017 

2 

12 

8 

720 

288 

60480 

17280 

3628800 


Special cases of (1.5). For k = 1, 2, 3,4, after expressing the backward differences 
in terms of f n _ ■, one obtains the formulas 


k — 1 : 
k = 2 : 

k = 3: 
k = 4 : 


y n+ 1 =y n + hf n (explicit Euler method) 

2/n+l y n ^ fn 



(1.5’) 


Recurrence relation for the coefficients. Using Euler’s method of generating 
functions we can deduce a simple recurrence relation for 7- (see e.g. Henrici 1962). 
Denote by G(t) the series 

00 

GW = £ 7 /. 

J =0 

With the definition of 7 • and the binomial theorem one obtains 


OO nl / \ pi OO 

gw = £(-w / ( ? s H s= / £ 

j =0 Jo \ 3 / Jo „_ n 
= f (1 —t)~ s ds - 

J 0 




(1-f) log(l-t)’ 


This can be written as 


£ W 1-f 


(l + -f + -t 2 + • • •) (7 0 + 7i^ + 7 2 ^ 2 + ...) — (l + 1 + f 2 + ...). 


or as 
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Comparing the coefficients of t 171 we get the desired recurrence relation 

Tm 2 1 g Tm— 2 ^ ‘ ‘ ‘ ^ ^ To & (1-^) 


Implicit Adams Methods 

The formulas (1.5) are obtained by integrating the interpolation polynomial (1.4) 
from x n to x n+1 , i.e., outside the interpolation interval (x n _ k+1 , x n ). It is well 
known that an interpolation polynomial is usually a rather poor approximation 
outside this interval. Adams therefore also investigated methods where (1.4) 
is replaced by the interpolation polynomial which uses in addition the point 
K +1 ,/„ + i),i.e„ 

P*(t)=P*(x n + sh) = ^2(-iy( -W (1.8) 

j =o \ J / 

(see Fig. 1.2). Inserting this into (1.2) we obtain the following implicit method 

k 

y n+ i = yn + hJ2^f n+1 ( 1 - 9 ) 

3 =0 

where the coefficients 7* satisfy 

7 j = (- 1 )- 7 J ( S + ds (1-10) 

and are given in Table 1.2 for j <8. Again, a simple recurrence relation can be 
derived for these coefficients (Exercise 3). 


Table 1.2. Coefficients for the implicit Adams methods 


3 

0 

1 

2 

3 

4 

5 

6 

7 

8 

* 

1 

1 

1 

1 

19 

3 

863 

275 

33953 

7? 

~2 

~12 

_ 24 

720 

160 

60480 

24192 

3628800 


The formulas thus obtained are generally of the form 

= yn + h(fl k fn +1 + * ‘ • + A)/ n _fc+i). 


Vn +1 


( 1 . 9 ’) 
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The first examples are as follows 


k = 0: 
k = 1: 

k = 2: 
k = 3: 


^n+l 

= Vn 

+ 

hf 

n+l y i 

% + hf(x 

n+1 ’ 2/n+l) 




K 

1 . 

1 . \ 


Vn+l 

~ Vn 

+ 

2 J n +1 5” 2 ^ n ) 





K 

5 

8 r 

1 f \ 

^n+1 

= Vn 

+ 

12^ n+1 

H- f 

12 Jn 

~ Y^Jn-1) 




K 

9 

19 

5 

^n+1 

= Vn 

+ 

24 




The special cases k = 0 and k = 1 are the implicit Euler method and the trapezoidal 
rule, respectively. They are actually one-step methods and have already been con¬ 
sidered in Chapter II.7. 

The methods (1.9) give in general more accurate approximations to the exact 
solution than (1.5). This will be discussed in detail when the concepts of order 
and error constant are introduced (Section III.2). The price for this higher accuracy 
is that y n+1 is only defined implicitly by formula (1.9). Therefore, in general a 
nonlinear equation has to be solved at each step. 


Predictor-corrector methods. One possibility for solving this nonlinear equation 
is to apply fixed point iteration. In practice one proceeds as follows: 

P: compute the predictor y n+1 = y n + h o 7j^ J /n by explicit Adams 
method (1.5); this already yields a reasonable approximation to y(x n+1 ); 

E: evaluate the function at this approximation: f n+1 = f(x n+1 ,y n+1 ); 

C: apply the corrector formula 

Vn +1 =y n + fl ((3kfn+l+0k-lfn + --- + Pofn-k+l ) (1-H) 

to obtain y n+1 . 

E: evaluate the function anew, i.e., compute / n+1 = /(x n+1 , y n+1 ). 

This is the most common procedure, denoted by PECE. Other possibilities are: 
PECECE (two fixed point iterations per step) or PEC (one uses f n+l instead of 
f n+ i in the subsequent steps). 

This predictor-corrector technique has been used by F.R. Moulton (1926) as 
well as by W.E. Milne (1926). J.C. Adams actually solved the implicit equation 
(1.9) by Newton’s method, in the same way as is now usual for stiff equations (see 
Volume II). 


Remark. Formula (1.5) is often attributed to Adams-Bashforth. Similarly, the mul¬ 
tistep formula (1.9) is usually attributed to Adams-Moulton (Moulton 1926). In 
fact, both formulas are due to Adams. 
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Numerical Experiment 

We consider the Van der Pol equation (1.16.2) with e — 1, take as initial values 
t/ 1 (0) = A, y 2 (0) = 0 on the limit cycle and integrate over one period T (for the 
values of A and T see Exercise 1.16.1). This is exactly the same problem as the 
one used for the comparison of Runge-Kutta methods (Fig. II. 1.1). We have applied 
the above explicit and implicit Adams methods with several fixed step sizes. The 
missing starting values were computed with high accuracy by an explicit Runge- 
Kutta method. Fig. 1.3 shows the errors of both components in dependence of the 
number of function evaluations. Since we have implemented the implicit method 
(1.9) in PECE mode it requires 2 function evaluations per step, whereas the explicit 
method (1.5) needs only one. 

This experiment shows that, for the same value of k , the implicit methods 
usually give a better result (the strange behaviour in the error of the y 2 -component 
for k > 3 is due to a sign change). Since we have used double logarithmic scales, 
it is possible to read the “numerical order” from the slope of the corresponding 
lines. We observe that the global error of the explicit Adams methods behaves like 
0(h k ) and that of the implicit methods like 0(h k+1 ). This will be proved in the 
following sections. 

We also remark that the scales used in Fig. 1.3 are exactly the same as those of 
Fig. II. 1.1. This allows a comparison with the Runge-Kutta methods of Section II. 1. 



explicit Adams, k- 1, 2, 3, 4, 
implicit Adams (PECE), k = 0, 1, 2, 3, 4. 


Fig. 1.3. Global errors versus number of function evaluations 
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Explicit Nystrom Methods 

Die angenaherte Integration hat, besonders in der letzten Zeit, 
ein ausgedehntes Anwendungsgebiet innerhalb der exakten Wis- 
senschaften und der Technik gefunden. (E.J. Nystrom 1925) 


In his review article on the numerical integration of differential equations (which 
we have already encountered in Section 11.14), Nystrom (1925) also presents a new 
class of multistep methods. He considers instead of (1.2) the integral equation 

r x n -\-1 

y(x n+ i) = y(x n -i) + f(t,y(t))dt. ( 1 . 12 ) 

J X n -1 

In the same way as above he replaces the unknown function /(£, y(t )) by the poly¬ 
nomial p(t) of (1.4) and so obtains the formula (see Fig. 1.4) 


with the coefficients 


k-1 

y n+ l=yn-l + h J2 K j WJ fn 

3=0 



(1.13) 


(1.14) 


The first of these coefficients are given in Table 1.3. E.J. Nystrom recommended 
the formulas (1.13), because the coefficients k- were more convenient for his com¬ 
putations than the coefficients 7 • of (1.6). This recommendation, surely reasonable 
for a computation by hand, is of little relevance for computations on a computer. 




Fig. 1.4. Explicit Nystrom methods Fig. 1.5. Milne-Simpson methods 


Table 1.3. Coefficients for the explicit Nystrom methods 


3 

0 

1 

2 

3 

4 

5 

6 

7 

8 

k 3 

9 

0 

1 

1 

29 

14 

1139 

41 

32377 


3 

3 

90 

45 

3780 

140 

113400 


Special cases. For k = 1 the formula 


Vn +1 — Vn-1 + ^fn 


(1.13’) 
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is obtained. It is called the mid-point rule and is the simplest two-step method. Its 
symmetry was extremely useful in the extrapolation schemes of Section II.9. The 
case k = 2 yields nothing new, because n 1 = 0. For k = 3 one gets 


2/n+l = Vn-1 + 



2 

3 


I n— 1 


+ 



(1.13”) 


Milne-Simpson Methods 


We consider again the integral equation (1.12). But now we replace the integrand 
by the polynomial p*(t) of ( 1 . 8 ), which in addition to / n ,..., f n _ k+1 also in¬ 
terpolates the value / n+1 (see Fig. 1.5). Proceeding as usual, we get the implicit 
formulas 

k 

y n+1 = y»- 1 + fc£*;v'7„ +1 . d-15) 

3 = 0 

The coefficients /<• are defined by 

K* = (- 1 ) J / ( (1.16) 

and the first of these are given in Table 1.4. 


Table 1.4. Coefficients for the Milne-Simpson methods 


j 

0 

i 

2 

3 

4 

5 

6 

7 

8 


9 

_9 

1 

0 

1 

1 

37 

8 

119 

Z- 

— z 

3 

_ 90 

90 

3780 

_ 945 

16200 


If the backward differences in (1.15) are expressed in terms of / n _ J , one obtains 
the following methods for special values of k : 

k = 0: y n +i = V n _\ + 2 />,/ n+1 , 

*=1: yn+l=yn-l+2 Vn> d* 15 ’) 

k = 2 1 ^n+1 = 2/n-l +^(3/71+1 + 3 fn + 3/n-i) J 

/29 124 24 4 1 \ 

k — 4: . y n+ i — 2/ n _! + —/ n+ i + QQ fn + g^fn-l + g^fn -2 _ J * 

The special case k = 0 is just Euler’s implicit method applied with step size 2h. 
For k = 1 one obtains the previously derived mid-point rule. The particular case 
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k = 2 is an interesting method, known as the Milne method (Milne 1926, 1970, p. 
66). It is a direct generalization of Simpson’s rule. 

Many other similar methods have been investigated. They are all based on an 
integral equation of the form 

y(x n+1 )=y(x n _ e )+ f f(t,y(t))dt, (1.17) 

J X n -£ 

where f(t, y(t )) is replaced either by the interpolating polynomial p(t) (formula 
(1.4)) or by p*(t) (formula (1.8)). E.g., for t — 3 one obtains 

Vn+l =yn-Z + h (^fn-\fn-l + \fn-2)- ( L18 ) 

This particular method has been used by Milne (1926) as a “predictor” for his 
method: in order to solve the implicit equation (1.15’), Milne uses one or two 
fixed-point iterations with the numerical value of (1.18) as starting point. 


Methods Based on Differentiation (BDF) 

“My name is Gear.” — “pardon?” 

“Gear, dshii, ii, ay, are.” — “Mr. Jiea?” 

(In a hotel of Paris) 

The multistep formulas considered until now are all based on numerical integration, 
i.e., the integral in (1.17) is approximated numerically using some quadrature for¬ 
mula. The underlying idea of the following multistep formulas is totally different 
as they are based on the numerical differentiation of a given function. 

Assume that the approximations y n _ k+1 ,..., y n to the exact solution of (1.1) 
are known. In order to derive a formula for y n+1 we consider the polynomial q{pc) 
which interpolates the values {(x i , y { ) \ i = n — k +1,..., n +1}. As in (1.8) this 
polynomial can be expressed in terms of backward differences, namely 

q(x)=q(x n + sh) = J2(- 1 ) j ( S + 1 )v j y n+1 . (1.19) 

j=o \ J J 

The unknown value y n+1 will now be determined in such a way that the polyno¬ 
mial q{pc) satisfies the differential equation at at least one grid-point, i.e., 

7 (*^n+l—r) /(^n+1—r’ ^n+1 —r)' (1.20) 

For r = 1 we obtain explicit formulas. For k = 1 and k = 2, these are equivalent 
to the explicit Euler method and the mid-point rule, respectively. The case k = 3 
yields 

3! y n +l + 2 Vn - Vn-1 + qVu- 2 = h fn • ( L21 ) 

This formula, however, as well as those for k > 3, is unstable (see Section III.3) 
and therefore useless. 
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Much more interesting are the formulas one obtains when (1.20) is taken for 
r = 0 (see Fig. 1.6). 



In this case one gets the implicit formulas 


with the coefficients 


k 


'E^y n+1 = hf n+ 1 

3 =0 






S = 1 


( 1 . 22 ) 


Using the definition of the binomial coefficient 

(-!) J ( = 2(s-l)s(s + l)...(s+j-2) 

the coefficients 5* are obtained by direct differentiation: 

<5*=0, 5* = - for j >1. (1.23) 

0 J 

Formula (1.22) therefore becomes 

k 

J2)viy n+1 mhf n+1 . (1.22’) 

3 = 1 J 

These multistep formulas, known as backward differentiation formulas (or BDF- 
methods), are, since the work of Gear (1971), widely used for the integration of 
stiff differential equations (see Volume II). They were introduced by Curtiss & 
Hirschfelder (1952); Mitchell & Craggs (1953) call them “standard step-by-step 
methods”. 

For the sake of completeness we give these formulas also in the form which 
expresses the backward differences in terms of the y n _j 

k 1 • Vn -\-1 Vn hfri+H 

3 1 

k = 2 : ~y n +1 — 2y n + ~V n -i = hf n+1 . 


(1.22”) 
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k = 3: 
k = A- 
k = 5: 
k = 6 : 


y2/ n+1 - 3 y n + - iy n _ 2 = hf n+1 , 

^2 yn-\-l 3 ^n—1 ^ dn—2 ^ Vn—Z ^ f n+1 ?' 

137 r r 10 5 1 

gQ ^n+1 1 ^ ^n—2 ^n —3 g^n—4 ^/n+l’ 


147 


15 


20 


15 


6 


1 


gQ ^n+1 ~b 2 dn—1 ^ ^ n— 2 ^ ^ Vn —3 g^n—4^” g^n—5 


= hf, 


n+1 * 


For k > 6 the BDF-methods are unstable (see Section III.3). 


Exercises 


1. Let the differential equation y' = y 2 , y[ 0) = 1 and the exact starting values 
y i = 1/(1 — x •) for i = 0,1,..., k — 1 be given. Apply the methods of Adams 
and study the expression y(x k ) — y k for small step sizes. 


2. Consider the differential equation at the beginning of this section. It describes 
the form of a drop and can be written as (F. Bashforth 1883, page 26; the same 
problem as Exercise 2 of Section II. 1 in a different coordinate system) 


where 


dx 

~T = 6 cos p, 


dz 

— = g sm p 
dp 


(1.24) 


1 

-b 

Q 


sin p 

x 


2 + (3z. 


(1.25) 


g may be considered as a function of the coordinates x and z. It can be 
interpreted as the radius of curvature and p denotes the angle between the 
normal to the curve and the z-axis (see Fig. 1.7 for (3 = 3 ). The initial values 
are given by x(0) = 0, z(0) = 0, g(0 ) = 1. 

Solve the above differential equation along the lines of J.C. Adams: 
a) Assuming 

Q — 1 + + . . . 


and inserting this expression into (1.24) we obtain after integration the 
truncated Taylor series of x(p) and z{p) in terms of & 2 , b 4 ,.... These 
parameters can then be calculated from (1.25) by comparing the coeffi¬ 
cients of p m . In this way one obtains the solution for small values of p 
(starting values). 

b) Use one of the proposed multistep formulas and calculate the solution for 
fixed (3 (say (3 = 3 ) over the interval [0, n \. 
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Fig. 1.7. Solution of the differential equation (1.24) 
and an illustration from the book of Bashforth 


3. Prove that the coefficients 7 *, defined by (1.10), satisfy 7 q = 1 and 

7m + 2 ^m-l + 3 7m-2 + -"+ m + 1 7o=0 for rn>l. 

4. Let Kp 7 * be the coefficients defined by (1.14), (1.16), (1.6), (1.10), 

respectively. Show that (with j_ 1 = 7 * j = 0) 

^ = 7 -^-1’ k j* =2 7,*-7;_i for J > 0. 

Hint. By splitting the integral in (1.14) one gets k- — 7 ^ + 7 * . The relation 
7 * = 'jj — 7 J -_i is obtained by using a well-known identity for binomial coef¬ 
ficients. 
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You know, I am a multistep man ... and don’t tell anybody, but the first 
program I wrote for the first Swedish computer was a Runge-Kutta code ... 
(G. Dahlquist, 1982, after some glasses of wine; printed with permission) 


A general theory of multistep methods was started by the work of Dahlquist 
(1956, 1959), and became famous through the classical book of Henrici (1962). 
All multistep formulas considered in the previous section have this in common that 
the numerical approximations y i as well as the values /• appear linearly. We thus 
consider the general difference equation 

a kVn+k + a k-iy n +k-l + • • • + a 0Vn = HPkfn+k + • * • + A)/n) ( 2 -l) 

which includes all considered methods as special cases. In this formula the a i and 
/3 i are real parameters, h denotes the step size and 

fi = f(Xi,yi), Xi = x 0 + ih. 

Throughout this chapter we shall assume that 

0, |rr 0 | + |/3 q| > 0. (2.2) 

The first assumption expresses the fact that the implicit equation (2.1) can be solved 
with respect to y n+k at least for sufficiently small h. The second relation in (2.2) 
can always be achieved by reducing the index k , if necessary. 

Formula (2.1) will be called a linear multistep method or more precisely a 
linear k-step method. We also distinguish between explicit (/ 3 k = 0) and implicit 
{(3 k 7 ^ 0 ) multistep methods. 


Local Error of a Multistep Method 

As the numerical solution of a multistep method does not depend only on the initial 
value problem (1.1) but also on the choice of the starting values, the definition of 
the local error is not as straightforward as for one-step methods (compare Sections 
II.2 and II.3). 

Definition 2.1. The local error of the multistep method (2.1) is defined by 

y( x k)-Vk 
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X n X n+1 ... x n + k _ 1 x n + k 

Fig. 2.1. Illustration of the local error 


where y(pc) is the exact solution of y' = f(x,y) , y(x 0 ) = y 0 , and y k is the nu¬ 
merical solution obtained from (2.1) by using the exact starting values y i =y(x i ) 
for i = 0,1,..., k — 1 (see Fig. 2.1). 

In the case k = 1 this definition coincides with the definition of the local error 
for one-step methods. In order to show the connection with other possible defi¬ 
nitions of the local error, we associate with (2.1) the linear difference operator L 
defined by 

k 

L(y, x, h) = ^2{a i y(x + ih)-hp i y\x + ih)^j. (2.3) 

z=0 

Here y(x) is some differentiable function defined on an interval that contains the 
values x + ih for i = 0,1 ,..., k. 

Lemma 2.2. Consider the differential equation (1.1) with f(x,y) continuously 
differentiable and let y(x) be its solution. For the local error one has 

( df \ -1 

y( x k)-Vk = L(y, x 0 , h). 

Here y is some value between y(x k ) and y k , if f is a scalar function. In the case 
of a vector valued function f , the matrix ^ (pc k , rf) is the Jacobian whose rows 
are evaluated at possibly different values lying on the segment joining y(x k ) and 
Vk • 

Proof. By Definition 2.1, y k is determined implicitly by the equation 

k-l 

J2 - h Pif{ x n y( x i ))) + a kVk ~ h Pkf( x ki Vk) = o. 

i =0 V 

Inserting (2.3) we obtain 

L{y, x 0 , h) = a k (y(x k ) - y k ) - h/3 k (/( x k , y{x k )) - f(x k , y k )) 


and the statement follows from the mean value theorem. 


□ 
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This lemma shows that a^ 1 L(y, x Q , h) is essentially equal to the local error. 
Sometimes this term is also called the local error (Dahlquist 1956, 1959). For 
explicit methods both expressions are equal. 


Order of a Multistep Method 


Once the local error of a multistep method is defined, one can introduce the concept 
of order in the same way as for one-step methods. 

Definition 2.3. The multistep method (2.1) is said to be of order p, if one of the 
following two conditions is satisfied: 

i) for all sufficiently regular functions y(x ) we have L(y , x, h ) = 0(hP +1 ); 

ii) the local error of (2.1) is 0(hP + x ) for all sufficiently regular differential equa¬ 
tions ( 1 . 1 ). 

Observe that by Lemma 2.2 the above conditions (i) and (ii) are equivalent. 
Our next aim is to characterize the order of a multistep method in terms of the free 
parameters a i and /?•. Dahlquist (1956) was the first to observe the fundamental 
role of the polynomials 

p(C) ^§ a k( k + a k-i( k 1 + ••• + «() 

^(C) = flk( k +flk-l( k 1 + ••• + /?()• 

They will be called the generating polynomials of the multistep method (2.1). 


(2.4) 


Theorem 2.4. The multistep method (2.1) is of order p, if and only if one of the 
following equivalent conditions is satisfied: 

k k k 

i) X/V () and = f° r 

i=0 i=0 i =0 

ii) g(e h )-ha(e h )0O(h p+1 ) for h -> 0; 

Q(0 


iii) 


logC 


- <7(0 = 0((C-l) p ) for 


Proof. Expanding y(x + ih) and y'(x + ih) into a Taylor series and inserting these 
series (truncated if necessary) into (2.3) yields 


L(y, x,h) = ^2 ( a i 7 hQ y iq) hr y {r+1) fa)) 

i=0 q >0 r>0 r ‘ 

= y( x ) X a i + 53 T y{q) ^ (53 a i iq ~ q X a* 9-1 ) • 


i =0 


i=0 


(2.5) 
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This implies the equivalence of condition (i) with L(y,x, h ) = 0(hP +1 ) for all 
sufficiently regular functions y(x). 

It remains to prove that the three conditions of Theorem 2.4 are equivalent. The 
identity 

L(exp, 0, h) = g(e h ) — hcr(e h ) 

where exp denotes the exponential function, together with 

k , k k 

L(exp, 0 ,h) = J2 a i + J2-^\ (J2 a i iq ~ q Yl > 

z=0 q> 1 i= 0 i= 0 

which follows from (2.5), shows the equivalence of the conditions (i) and (ii). 

By use of the transformation ( = e h (or h = log () condition (ii) can be written 
in the form 

e(0 - log C • o-(C) = o ((log C) p+1 ) for C 1. 

But this condition is equivalent to (iii), since 

logC = (C-l) + C’((C-l) 2 ) for C — ^ 1- □ 


Remark. The conditions for a multistep method to be of order 1, which are usually 
called consistency conditions, can also be written in the form 

g(l) = 0, g\l) = a(l). (2.6) 

Once the proofs of the above order conditions have been understood, it is not dif¬ 
ficult to treat the more general situation of non-equidistant grids (see Section III.5 
and the book of Stetter (1973), p. 191). 

Example 2.5. Order of the explicit Adams methods. Let us first investigate for 
which differential equations the explicit Adams methods give theoretically the ex¬ 
act solution. This is the case if the polynomial p(t) of (1.4) is equal to f(t, y(t)). 
Suppose now that f(t,y ) = f(t) does not depend on y and is a polynomial of 
degree less than k. Then the explicit Adams methods integrate the differential 
equations 

y' = qx q ~ 1 , for q = Q,l, .. .,k 

exactly. This means that the local error is zero and hence, by Lemma 2.2, 

k k 

0 = L(x q , 0, h) = h q ap q — q ^ for q = 0,..., k. 

i =0 i —0 

This is just condition (i) of Theorem 2.4 with p = k so that the order of the explicit 
Adams methods is at least k. In fact it will be shown that the order of these methods 
is not greater than k (Example 2.7). 
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Example 2.6. For implicit Adams methods the polynomial p* (t) of (1. 8 ) has degree 
one higher than that of p(t ). Thus the same considerations as in Example 2.5 show 
that these methods have order at least k + 1 . 

All methods of Section III. 1 can be treated analogously (see Exercise 3 and 
Table 2.1). 


Table 2.1. Order and error constant of multistep methods 


method 

formula 

order 

error constant 

explicitAdams 

(1.5) 

k 

Ik 

implicitAdams 

(1.9) 

k+ 1 

* 

7fc+l 

midpoint rule 

(1.13’) 

2 

1/6 

Nystrom, k > 2 

(1.13) 

k 

^k/ 2 

Milne, k = 2 

(1.15’) 

4 

-1/180 

Milne-Simpson, k > 3 

(1.15) 

k+ 1 

4+1/2 

BDF 

(1.22’) 

k 

-l/(fc+l) 


Error Constant 


The order of a multistep method indicates how fast the error tends to zero if h—> 0. 
Different methods of the same order, however, can have different errors; they are 
distinguished by the error constant. Formula (2.5) shows that the difference opera¬ 
tor L, associated with a pth order multistep method, is such that for all sufficiently 
regular functions y(x) 

L(y, x, h) = C p+1 h p+1 y( p+1 \x) + 0(h p+2 ) (2.7) 

where the constant C p+1 is given by 

k k 

= T-^VTi (E ft® 

i =0 i =0 

This constant is not suitable as a measure of accuracy, since multiplication of for¬ 
mula ( 2 . 1 ) by a constant can give any value for C +1 , whereas the numerical so¬ 
lution {y n } remains unchanged. A better choice would be the constant o;^ 1 C f p+1 , 
since the local error of a multistep method is given by (Lemma 2.2 and formula 
(2.7)) 

y{x k ) -y k = apC p+1 h p+1 y( p+1 \x 0 ) + 0(h p+2 ). (2.9) 
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For several reasons, however, this is not yet a satisfactory definition, as we shall 
see from the following motivation: let 

V( x n ) Z hi 

n h p 

be the global error scaled by h p , and assume for this motivation that e n = 0(1). 
Subtracting (2.1) from (2.3) and using (2.7) we have 

k k 

Y. a i e n+i = hl ~ P J2Pi{f( X n+i’y( X n+i))-f( X n+i,yn+i)) 

i= 0 i =0 (2.1U) 

+ C p+l hy^ +1 \x n ) + 0{h 2 ). 

The point is now to use 

y iP+1 \ X n) = ^4jT J2 /3 i y( ' P+1) ^ X n+i)+°( h ) (2- 11 ) 

' ' i =0 

which brings the error term in (2.10) inside the sum with the /3 i . We linearize 

Of 

f( X n+i > y( X n+i)) ~ f( X n+V Vn+i) = (• X n+i . V( X n+i)) e n+i + 0{h 2p ) 

and insert this together with (2.11) into (2.10). Neglecting the 0(h 2 ) and 0(h 2 P) 
terms, we can interpret the obtained formula as the multistep method applied to 

e'(x) = ^ (x,y(x))e(x)+Cy {p+1 \x), e(x 0 )=0, (2.12) 

where 

C=% (2.13) 

^( 1 ) 

is seen to be a natural measure for the global error and is therefore called the error 
constant. 

Another derivation of Definition (2.13) will be given in the section on global 
convergence (see Exercise 2 of Section III.4). Further, the solution of (2.12) gives 
the first term of the asymptotic expansion of the global error (see Section III.9). 

Example 2.7. Error constant of the explicit Adams methods. Consider the differ¬ 
ential equation y' = f(x) with f(x) = (k + l)x k , the exact solution of which is 
y(x) = x k+1 . As this differential equation is integrated exactly by the (k + 1) -step 
explicit Adams method (see Example 2.5), we have 

k 

y( x k ) - y( x k~i) = h J2 Tj- VJ 7fc-1- 

j=0 

The local error of the A;-step explicit Adams method (1.5) is therefore given by 

y{x k )-y k = = h k+1 lk f^ k \x 0 ) = h k+1 lk y^ k+1 \x 0 ). 
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As 7 ^ 7 ^ 0, this formula shows that the order of the k -step method is not greater 
than k (compare Example 2.5). Furthermore, since a k = 1, a comparison with 
formula (2.9) yields C k+1 = j k . Finally, for Adams methods we have g(Q = 
C k — C k_1 and £'( 1 ) = 1 , so that by the use of ( 2 . 6 ) the error constant is given by 

c= 7 *. 

The error constants of all other previously considered multistep methods are 
summarized in Table 2.1 (observe that cr(l) = 2 for explicit Nystrom and Milne- 
Simpson methods). 


Irreducible Methods 


Let g(() and cr(£) of formula (2.4) be the generating polynomials of (2.1) and 
suppose that they have a common factor <p((). Then the polynomials 


e*(C) = 


¥>(C) ’ 


**(C) 


o-(C) 


are the generating polynomials of a new and simpler multistep method. Using the 
shift operator E , defined by 


Ey n = y n+ 1 or Ey(x) = y(x + h), 

this multistep method can be written in compact form as 


Q*{E)y n = hcr*(E)f n . 

Multiplication by ip(E) shows that any solution {y n } of this method is also a so¬ 
lution of g(E)y n = h<j(E)f n . The two methods are thus essentially equal. Denote 
by L* the difference operator associated with the new reduced method, and by 
C * +1 the constant given by (2.7). As 

L(y, x, h) = <p(E)L*(y, x, h ) = C* +1 h p+1 tp(E)y {p+1 \x) + 0(h p+2 ) 

= c; + Mi)h p+1 y (p+1 \x) + 0(h p+2 ) 

one immediately obtains C p+l = y?(l)C * +1 and therefore also the relation 

C p+1 /a(l) = C; +1 /a*(l) 

holds. Both methods thus have the same error constant. 

The above analysis has shown that multistep methods whose generating poly¬ 
nomials have a common factor are not interesting. We therefore usually assume 
that 

g(() and a(() have no common factor. (2.14) 


Multistep methods satisfying this property are called irreducible. 
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The Peano Kernel of a Multistep Method 


The order and the error constant above do not yet give a complete description of 
the error, since the subsequent terms of the series for the error may be much larger 
than C p+1 . Several attempts have therefore been made, originally for the error of 
a quadrature formula, to obtain a complete description of the error. The following 
discussion is an extension of the ideas of Peano (1913). 


Theorem 2.8. Let the multistep method (2.1) be of order p and let q (1 <q<p) 
be an integer. For any (q+ 1)-times continuously differentiable function y(x) we 
then have 

r k 

L(y, x, h) = h q+1 / K (s)y^ q+1 \x + sh) ds, (2.15) 

Jo 


where 


with 




i=0 


(V 


k 

Tj! ^2 

J i=0 




q-1 


( (i — s) r for i — s > 0 
\ 0 for i — s < 0. 


(2.16a) 


K q (s) is called the qth Peano kernel of the multistep method (2.1). 


Remark. We see from (2.16a) that K (s) is a piecewise polynomial and satisfies 


-t k k 

K q 0 ) = - } ^Z a i^- S ) q - (g- 1 )! ^ 5 G [2 _ 1 5 7 ) • 

(2.16b) 


qi z - 

i=3 


i=j 


Proof. Taylor’s theorem with the integral representation of the remainder yields 

q - r pi (• \q 

y(x + ih) = ^ l — h r y^ (x) + h q+1 / —— y — (x + sh) ds , 

r =0 r * 

hy'(x + ih) = y2j -—Try h r y( r \x) + h q+1 I — ^ y (g+1) (x + sh) ds. 

Inserting these two expressions into (2.3), the same considerations as in the proof 
of Theorem 2.4 show that for q < p the polynomials before the integral cancel. The 
statement then follows from 

f ij—fl y(q+V (a; + sh) ds = f y(i +1 \x + sh) ds. 

Jo q ! Jo q- □ 
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Besides the representation (2.16), the Peano kernel K (s) has the following 
properties: 

K q (s) = 0 for s G (—oo, 0) U [fc, oo) and q m 1,... ,p; (2.17) 

K (s) is (q — 2 ) -times continuously differentiable and 

K' q {s) = —K q -i( s ) for q = 2 ,... (for q = 2 piecewise); (2.18) 


iT^s) is a piecewise linear function with discontinuities at 
0,1,..., k. It has a jump of size /? • at the point j and its 
slope over the interval (j — 1 , j) is given by — (a • + a - +1 + 

• • •+ a fe); (2-19) 


For the constant C p+1 of (2.8) we have C p+1 = K p (s)ds . (2.20) 


The proofs of Statements (2.17) to (2.20) are as follows: it is an immediate con¬ 
sequence of the definition of the Peano kernel that K q (s) = 0 for s > k and 
q<p. In order to prove that K (s) = 0 also for s < 0 we consider the polynomial 
y(pc) = (x — s)Q with s as a parameter. Theorem 2.8 then shows that 

k k 

L(y, 0, i) = 53 a i (* - s ) q ~ 9 5Z “ s ) 9_1 = 0 for q<p 

i =0 z=0 

and hence K q (s) = 0 for s < 0. This gives (2.17). The relation (2.18) is seen 
by partial integration of (2.15). As an example, the Peano kernels for the 3-step 
Nystrom method (1.13”) are plotted in Fig. 2.2. 



Fig. 2.2. Peano kernels of the 3-step Nystrom method 
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Exercises 

1. Construction of multistep methods. Let g(() be a k th degree polynomial sat¬ 
isfying g( 1) = 0. 

a) There exists exactly one polynomial a(() of degree < k , such that the 
order of the corresponding multistep method is at least k + 1. 

b) There exists exactly one polynomial cr(() of degree < k , such that the cor¬ 
responding multistep method, which is then explicit, has order at least k . 

Hint. Use condition (iii) of Theorem 2.4. 

2. Find the multi step method of the form 

Vn +2 + ^l^n+l + a 0 Vn = KPlfn+1 + Mn) 

of the highest possible order. Apply this formula to the example y' = y, y( 0) = 
1, ft @*0.1. 

3. Verify that the order and the error constant of the BDF-formulas are those of 
Table 2.1. 

4. Show that the Peano kernel K (s) does not change sign for the explicit and 
implicit Adams methods, nor for the BDF-formulas. Deduce from this property 
that 

L(y,x,h) = h p+1 C p+1 y( p+1 \Q with ( e (x,x + kh) 

where the constant C p+l is given by (2.8). 

5. Let y(x) be an exact solution of y'm f(x, y) and let y i = y(x i ), i = 0,1,..., 
k — 1. Assume that / is continuous and satisfies a Lipschitz condition with 
respect to y (/ not necessarily differentiable). Prove that for consistent multi- 
step methods (i.e., methods with (2.6)) the local error satisfies 

\\y{ x k)-Vk\\ <hu{h) 

where uj(h) —> 0 for h —> 0. 
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... hat der Verfasser seither ofters Verfahren zur numerischen 
Integration von Differentialgleichungen beobachtet, die, obschon 
zwar mit bestechend kleinem Abbruchfehler behaftet, doch die 
grosse Gefahr der numerischen Instability in sich bergen. 

(H. Rutishauser 1952) 


Rutishauser observed in his famous paper that high order and a small local error 
are not sufficient for a useful multistep method. The numerical solution can be 
“unstable”, even though the step size ft is taken very small. The same observation 
was made by Todd (1950), who applied certain difference methods to second order 
differential equations. Our presentation will mainly follow the lines of Dahlquist 
(1956), where this effect has been studied systematically. An interesting presenta¬ 
tion of the historical development of numerical stability concepts can be found in 
Dahlquist (1985) “33 years of numerical instability, Part I”. 

Let us start with an example, taken from Dahlquist (1956). Among all explicit 
2 -step methods we consider the formula with the highest order (see Exercise 2 of 
Section III.2). A short calculation using Theorem 2.4 shows that this method of 
order 3 is given by 

y n +2 + A V n +1 - %n = H^fn+l + 2 fn )• C 3 ' 1 ) 

Application to the differential equation 

y' = y, j/(0) = l (3.2) 

yields the linear difference relation 

V n +2 + 4(1 - h)y n+l - (5 + 2 h)y n = 0. (3.3) 

As starting values we take y 0 = 1 and y x = exp (ft), the values on the exact solution. 
The numerical solution together with the exact solution exp(x) is plotted in Fig. 3.1 
for the step sizes ft = 1/10, ft = 1/20, ft = 1/40, etc. In spite of the small local 
error, the results are very bad and become even worse as the step size decreases. 

An explanation for this effect can easily be given. As usual for linear difference 
equations (Dan. Bernoulli 1728, Lagrange 1775), we insert y- — (7 into (3.3). This 
leads to the characteristic equation 

( 2 + 4(1 — h)( — (5 + 2h) = 0. (3.4) 

The general solution of (3.3) is then given by 

y n =AQ{h) + BQ{h) 


(3.5) 
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Fig. 3.1. Numerical solution of the unstable method (3.1) 


where 

C 1 (h) = l + h + 0(h 2 ), C 2 {h) = -5 + 0(h) 

are the roots of (3.4) and the coefficients A and B are determined by the start¬ 
ing values y Q and y 1 . Since Ci(^) approximates exp (h), the first term in (3.5) 
approximates the exact solution exp(x) at the point x — nh. The second term 
in (3.5), often called a parasitic solution, is the one which causes trouble in our 
method: since for h —> 0 the absolute value of £ 2 (ft) is larger than one, this par¬ 
asitic solution becomes very large and dominates the solution y n for increasing 
n. 

We now turn to the stability discussion of the general method (2.1). The essen¬ 
tial part is the behaviour of the solution as n —> oo (or h —> 0) with nh fixed. We 
see from (3.3) that for h —> 0 we obtain 

a kVn+k + Oi k _ 1 y n+k _ 1 + • • • + a 0 y n = 0. (3.6) 

This can be interpreted as the numerical solution of the method (2.1) for the differ¬ 
ential equation 

y' = 0. (3.7) 

We put y- = (4 in (3.6), divide by Cj 1 , and find that ( must be a root of 

e(0 = a k( k + a k - 1 ( k 1 + --- + ^o = 0- (3-8) 

As in Section 1.13, we again have some difficulty when (3.8) possesses a root of 
multiplicity m > 1. In this case (Lagrange 1792, see Exercise 1 below) y n = 
1 ( n (j = 1,..., m) are solutions of (3.6) and we obtain by superposition: 

Lemma 3.1. Let , . Q be the roots of g((), of respective multiplicity 
m 1? ..., . Then the general solution of (3.6) is given by 

Vu=Pi(n)Q + ---+P l {n)Q (3.9) 

where the Pj(n) are polynomials of degree m ■ — 1. □ 
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Formula (3.9) shows us that for boundedness of y n , as n —> oc , we need that 
the roots of (3.8) lie in the unit disc and that the roots on the unit circle be simple. 

Definition 3.2. The multistep method (2.1) is called stable, if the generating poly¬ 
nomial g(C/) (formula (3.8)) satisfies the root condition, i.e., 

i) The roots of g(Q lie on or within the unit circle; 

ii) The roots on the unit circle are simple. 

Remark. In order to distinguish this stability concept from others, it is sometimes 
called zero-stability or, in honour of Dahlquist, also D-stability. 

Examples. For the explicit and implicit Adams methods, g(C) = C k ~ C k_1 • Be¬ 
sides the simple root 1, there is a (k — 1) -fold root at 0. The Adams methods are 
therefore stable. 

The same is true for the explicit Nystrom and the Milne-Simpson methods, 
where g(() =( k — ( k ~ 2 . Note that here we have a simple root at — 1. This root can 
be dangerous for certain differential equations (see Section III.9 and Section V.l of 
Volume II). 


Stability of the BDF-Formulas 

The investigation of the stability of the BDF-formulas is more difficult. As the 
characteristic polynomial of Vly k+n = 0 is given by ( k ~ j (( — l)- 7 = 0 it follows 
from the representation (1.22’) that the generating polynomial g(() of the BDF- 
formulas has the form 

k 

Q(0 = y2-c k - j ((-iy. (3.io) 

In order to study the zeros of (3.10) it is more convenient to consider the polynomial 

k 

P ( z ) = (i- z ) k Q(-E-) = y2^- (3.ii) 

3=1 J 

via the transformation ( = 1/(1 — z). This polynomial is just the k th partial sum 
of — log(l — z). As the roots of p(z) and g(Q are related by the above transfor¬ 
mation, we have: 


Lemma 3.3. The k-step BDF-formula (1.22 ’) is stable iff all roots of the polyno¬ 
mial (3.11) are outside the disc {z; \z — 1| < 1}, with simple roots allowed on the 
boundary. □ 
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Fig. 3.2. Roots of the polynomial p(z) of (3.11) 

The roots of (3.11) are displayed in Fig. 3.2 for different values of k . 

Theorem 3.4. The k-step BDF-formula (1.22’) is stable for k < 6, and unstable 
for k> 7. 

Proof The first assertion can be verified simply by a finite number of numerical 
calculations (see Fig. 3.2). This was first observed by Mitchell & Craggs (1953). 
The second statement, however, contains an infinity of cases and is more difficult. 
The first complete proof was given by Cryer (1971) in a technical report, a con¬ 
densed version of which is published in Cryer (1972). A second proof is given in 
Creedon & Miller (1975) (see also Grigorieff (1977), p. 135), based on the Schur- 
Cohn criterion. This proof is outlined in Exercise 4 below. The following proof, 
which is given in Hairer & Wanner (1983), is based on the representation 

P( z )= [ XX - — fd(=f (l-e lk9 s k )tp(s)ds (3.12) 
J o j=1 J o i — s 7o 

with 

piO 

C = se l6 , z = re* 0 , <p(s) = - 

1 — se w 

We cut the complex plane into k sectors 


j = 0, -1. 
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On the rays bounding Sj we have e ik0 = — 1, so that from (3.12) 

p(z) = [ (1 + s k )p(s) ds 

J o 

with a positive weight function. Therefore, p(z) always lies in the sector between 
e ie and e i7T = —1, which contains all values ip(s) (see Theorem 1.1 on page 1 
of Marden (1966)). So no revolution of ang(p(z)) is possible on these rays, and 
due to the one revolution of arg (z k ) at infinity between 0 = 2ir(j — l/2)/k and 
6 = 2ir(j + 1/2)/k the principle of the argument (e.g., Henrici (1974), p. 278) 
implies (see Fig. 3.3) that in each sector Sj (j = 1,..., k — 1, with the exception 
of j = 0) there lies exactly one root of p(z ). 


1 


Fig. 3.3. Argument of p(z) of (3.11) 



In order to complete the proof, we still have to bound the zeros of p(z) from 
above: we observe that in (3.12) the term s k becomes large for s > 1. We therefore 
partition (3.12) into two integrals p(z) = 1 1 — / 2 , where 

I x = f (p(s)ds— f e lke s k p(s) ds, I 2 = e lk0 f s k cp(s)ds. 

J o Jo J 1 

Since \ip(s)\ < B(6) where 

_ f I sin^| _1 if 0<^<7 t/ 2 or 37 t/2<^<27t, 

l 1 otherwise, 


we obtain 

\h\< (r +] 2-^B(d)<rB(e)^2, (r > 1). 


(3.13) 
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Secondly, since s k is positive, 

I 2 = e lk0 § j s k ds with <E> G convex hull of (<£>(s); 1 < s <r}. 

Any element of the above convex hull can be written in the form 

$ = o^Si) + (1 - a)(fi(s 2 ) = 

with s'= as 2 -\-(1 — a)s ± , 0<a<l, l<s 1 ,s 2 <r. Since |<^(s)| decreases 
monotonically for s > 1, we have |<f>| > \ip(r)\. Some elementary geometry then 
leads to | <f> | > 1 / 2 r and we get 


^ r k +1 — l ^ r(r fc_ i — 1) 
- 2r(k + 1 ) > 2k + 2 


(r > 1). 


(3.14) 


From (3.13) and (3.14) we see that 


r>R(6)=((2k + 4)B(6) + l) 1/{k (3.15) 

implies \I 2 \ > ||, so that p(z) cannot be zero. The curve R(6) is also plotted 
in Fig. 3.2 and cuts from the sectors 5 • what we call Madame Imhof’s cheese pie, 
each slice of which (with j ^ 0) must contain precisely one zero of p(z ). A simple 
analysis shows that for k = 12 the cheese pie, cut from S 1 , is small enough to 
ensure the presence of zeros of p(z) inside the disc {z; \z — 1| < 1}. As R(6 ), for 
fixed 0 , as well as R(n/k) are monotonically decreasing in k , the same is true for 
all k >12. 

For 6 < k < 12 numerical calculations show that the method is unstable (see 
Fig. 3.2 or Exercise 4). □ 


Highest Attainable Order of Stable Multistep Methods 


It is a natural task to investigate the stability of the multistep methods with high¬ 
est possible order. This has been performed by Dahlquist (1956), resulting in the 
famous “first Dahlquist-barrier”. 

Counting the order conditions (Theorem 2.4) shows that for order p the param¬ 
eters of a linear multistep method have to satisfy p + 1 linear equations. As 2k + 1 
free parameters are involved (without loss of generality one can assume a k = 1 ), 
this suggests that 2A; is the highest attainable order. Indeed, this can be verified 
(see Exercise 5). However, these methods are of no practical significance, because 
we shall prove 
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Theorem 3.5 (The first Dahlquist-barrier). The order p of a stable linear k-step 
method satisfies 

p < k + 2 if k is even, 
p <k + 1 if k is odd, 

p < k if (3 k /a k < 0 (in particular if the method is explicit). 


We postpone the verification of this theorem and give some notations and lem¬ 
mas, which will be useful for the proof. First of all we introduce the “Greek-Roman 
transformation” 


c = 


z + 1 
z-1 


or 


c+ 

C-1' 


(3.16) 


This transformation maps the disk \(\ < 1 onto the half-plane Re z < 0, the upper 
half-plane Im z > 0 onto the lower half-plane, the circle \(\ = 1 to the imaginary 
axis, the point ( = 1 to 2 = oc and the point ( = — 1 to z = 0. We then consider 
the polynomials 

k 

R ( z ) = e(C) = a j zi ’ 

j= k ° (3 - 17) 
3 =0 


Since the zeros of R(z) and of g(() are connected via the transformation (3.16), 
the stability condition of a multistep method can be formulated in terms of R(z) as 
follows: all zeros of R(z) lie in the negative half-plane Re z < 0 and no multiple 
zero of R(z) lies on the imaginary axis. 


Lemma 3.6. Suppose the multistep method to be stable and of order at least 0. We 
then have 

i) a k — 0 and a k _ x = 2 1 ~ k g'(l) 0 ; 

ii) All non-vanishing coefficients of R(z) have the same sign. 

Proof. Dividing formula (3.17) by z k and putting z W 00, one sees that a k = 
2~ k g(l) . This expression must vanish, because the method is of order 0. In the 
same way one gets a k _ 1 = 2 1 ~ k g'(l) , which is different from zero, since by sta¬ 
bility 1 cannot be a multiple root of g(() . The second statement follows from the 
factorization 

R(z) =a k _ 1 Y[(z + x j )Y[((z + u j ) 2 +v^). 

where —x- are the real roots and —u - =b iv- are the conjugate pairs of complex 
roots. By stability x- > 0 and u- >0, implying that all coefficients of R(z) have 
the same sign. □ 
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We next express the order conditions of Theorem 2.4 in terms of the polyno¬ 
mials R(z) and S(z). 

Lemma 3.7. The multistep method is of order p if and only if 

**>(>» g H )" 1 - (T" +i °«?r +i ) 

(3.18) 

Proof First, observe that the — l) p ) term in condition (iii) of Theorem 
2.4 is equal to C p+1 (C — l) p + 0((( ~ l) p+1 ) by formula (2.7). Application of 
the transformation (3.16) then yields (3.18), because (( — 1) = 2/(z — 1) = 2/z + 

0((2/z) 2 ) for z —> oo. □ 


Lemma 3.8. The coefficients of the Laurent series 

( log 7Tl) = \ - ^ z ~ X - ^3 z ~ 3 - l^5 z ~ 5 - • • • (3-19) 

satisfy foj+i > 0 for all j > 0. 


Proof We consider the branch of log £ which is analytic in the complex £ -plane 
cut along the negative real axis and satisfies log 1 = 0. The transformation (3.16) 
maps this cut onto the segment from —1 to +1 on the real axis. The function 
log ((2 + l)/(z — 1)) is thus analytic on the complex 2 -plane cut along this segment 
(see Fig. 3.4). From the formula 


log 


z +1 


z-1 




(3.20) 


the existence of (3.19) becomes clear. In order to prove the positivity of the co¬ 
efficients, we use Cauchy’s formula for the coefficients of the function f(z) = 

SnGZ a n( Z ~ Z o) n ’ 

1 f f ( Z ) 

a n = -— / ---—— dz, 

27 Tl (z—Z 0 ) n+1 

i.e., in our situation 

= i) 

(Cauchy 1831; see also Behnke & Sommer 1962). Here 7 is an arbitrary curve 
enclosing the segment (—1,1), e.g., the curve plotted in Fig. 3.4. 



Fig. 3.4. Cut 2 -plane with curve 7 
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Observing that log ((2 + 1 )/(z — 1)) = log((l +x)/(l — x)) —itt when z ap¬ 
proaches the real value x E (— 1 , 1 ) from above, and that log((z + l)/(z — 1 )) = 
log((l + x)/(l — x)) + in when z approaches x from below, we obtain 


foj+i 



For another proof of this lemma, which avoids complex analysis, see Exer¬ 
cise 10 . 


Proof of Theorem 3.5. We insert the series (3.19) into (3.18) and obtain 
R(z) ^log ^ | ^ — 5(z) = polynomial (z) + d 2 z~ 2 + 0(z~ 3 ) (3.21) 

where 

d i = ~Pi a o ~ Ts a 2 ~ T^a - • • • 
d 2 — ~~ M3 a l — M5 a 3 — M7 a 5 — * * * * 


Lemma 3.6 together with the positivity of the fi- (Lemma 3.8) implies that all 
summands in the above formulas for d 1 and d 2 have the same sign. Since a k _ 1 0 
we therefore have d 2 7 ^ 0 for k even and d 1 0 for k odd. The first two bounds 
of Theorem 3.5 are now an immediate consequence of formula (3.18). 

Finally, we prove that p < k for (3 k /a k < 0: assume, by contradiction, that the 
order is greater than k. Then by formula (3.18), 5(z) is equal to the principal part 
of R(z) (log((z T 1) /(z — 1))) —1 , and we may write (putting fi - = 0 for even j ) 

S (.2) = R(z) + J2 (S l^s a s-j) Z ~ J ■ 

3 = 1 3 = 1 s =3 

Setting z = 1 we obtain 

Qf-t\ -1 k — 1 k—1 k—1 1 

r^ = (5-E^)+E(Ema- ) )ro)- < 3 - 23 > 

j= 1 3 = 1 S =J 


Since by formula (3.17), 5(1) = (3 k and i?(l) = a k , it is sufficient to prove 
5(1)/-5(1) > 0. Formula (3.19), for z —> 1, gives 



j=i 


so that the first summand in (3.23) is strictly positive. The non-negativeness of the 
second summand is seen from Lemmas 3.6 and 3.8. □ 
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The stable multistep methods which attain the highest possible order k + 2 
have a very special structure. 

Theorem 3.9. Stable multistep methods of order k + 2 are symmetric, i.e., 

otj = -Oi k _ p = P k _j for all j. (3.24) 

Remark. For symmetric multistep methods we have gif) = —( k g( 1/C) by def¬ 
inition. Since with also 1/Q is a zero of g(C)> all roots of stable symmetric 
multistep methods lie on the unit circle and are simple. 

Proof A comparison of the formulas (3.18) and (3.21) shows that d 1 = 0 is neces¬ 
sary for order k + 2. Since the method is assumed to be stable, Lemma 3.6 implies 
that all even coefficients of R(z) vanish. Hence, k is even and R(z) satisfies 
the relation R(z) = — R(—z ). By definition of R(z) this relation is equivalent to 
g(() = ~( k g( 1/C)’ which implies the first condition of (3.24). Using the above re¬ 
lation for R(z) one obtains from formula (3.18) that S(z) — S(—z) = 0((2/z ) 2 ), 
implying S(z) = S(—z ). If this relation is transformed into an equivalent one for 
cr((), one gets the second condition of (3.24). □ 


Exercises 


1. Consider the linear difference equation (3.6) with 

Q(0 = a k( k + a k-lC k 1 + • • • + Oi 0 

as characteristic polynomial. Let £ 1? ..., be the different roots of g(() and 
let m- > 1 be the multiplicity of the root ^ •. Show that for 1 < 3 <1 and 
0 < i < m J: — 1 the sequences 

{(I )<r}„ so 

form a system of k linearly independent solutions of (3.6). 

2. Show that all roots of the polynomial p(z) of formula (3.11) except the simple 
root 0 lie in the annulus 


Hint. Use the following lemma, which can be found in Marden (1966), p. 137: 
if all coefficients of the polynomial a k z k + a k _ 1 z k ~ 1 + ... + & 0 are real and 
positive, then its roots lie in the annulus g 1 < \z\ < g 2 with g 1 = min {a-/a- +1 ) 
and g 2 = max(a - /a • +1 ). 
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3. Apply the lemma of the above exercise to p(C)/(C — 1) and show that the 
BDF-formulas are stable for k = 1, 2, 3,4. 

4. Give a different proof of Theorem 3.4 by applying the Schur-Cohn criterion to 
the polynomial 


f(z) = z k e(±)=Y i *(l-zy. (3.25) 

2 3 = l j 


Schur-Cohn criterion (see e.g., Marden (1966), Chapter X). For a given poly¬ 
nomial with real coefficients 


f(z) = a 0 + + ... + a k z k 

(i) 

we consider the coefficients a) where 


( 0 ) 

a i = ^ 

G+ 1 ) Aj)Aj) Ao) Aj) 


a i 

and also the products 


P — 

PL ~ a 0 5 


a k-j a k-j-i 


p _p 0 (i + 1 ) 

P7+1 _ P/' a 0 


i = 0,1,.. * 5 k 
^ = 0,1, •. •, k—j—1 


for j = l,...,A;-l. 


(3.26) 


(3.27) 


We further denote by n the number of negative elements among the values 
P 1? ..., P k and by p the number of positive elements. Then f(z) has at least 
n zeros inside the unit disk and at least p zeros outside it. 

a) Prove the following formulas for the coefficients of (3.25): 



i= 1 


k{k- 1) 
4 ’ 


(3.28) 

(i) 

b) Verify that the coefficients a K 0 J of (3.26) have the sign structure of Table 
3.1. For k < 13 these tedious calculations can be performed on a computer. 
The verification of >0 and > 0 is easy for all k > 2. In order to 
verify = ( a q 2 ^) 2 — (a^ 2 ) 2 < 0 for k > 13 consider the expression 


a 


( 2 ) 

0 


(-V 4 L 


~ a o\ a O °ife a ol a fe- 2 l 

- l4-il • K + M)(l« 


+ a 2l a fcl) 

fc-ll + a l) 


(3.29) 
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Table 3.1. Signs of a ( 0 j) . 


k 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

> 13 

3 = 1 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

<N 

II 

•o* 

0 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

3= 3 


0 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

- 

- 

i=4 



0 

+ 

+ 

+ 

- 

- 

- 

- 

- 



i=5 




0 

+ 

- 









which can be written in the form (a 0 + \a k \)ip(k) with 
ip(k) = (a 0 - |o fe |)(ag - a\ - a 0 |a fe _ 2 | + a 2 |a fe |) - |aj4i |(a x + |a fc _ 1 1) 


= a n — a, 


2 f k 1 1 In 

°\2 + 2 + l~2 + k) 

(5k 1 

°VT + 4 


2k-A k- 1 ( k-lf 


1 

k 2 


(i 3 1 1 1 \ 

V 4 fc —1 4A; A; 3 / 


Show that <^(13) < 0 and that cp is monotonically decreasing for k > 13 
(observe that a 0 = a 0 (k) actually depends on k and that a 0 (k + 1) = 
a 0 (k) + l/(fc + 1)) • Finally, deduce from the negativeness of (3.29) that 
< 0 for k > 13. 

c) Use Table 3.1 and the Schur-Cohn criterion for the verification of Theorem 
3.4. 


5. (Multistep methods of maximal order). Verify the following statements: 

a) there is no k -step method of order 2k + 1, 

b) there is a unique (implicit) A;-step method of order 2fc, 

c) there is a unique explicit A;-step method of order 2k — 1. 

6. Prove that symmetric multistep methods are always of even order. More pre¬ 
cisely, if a symmetric multistep method is of order 25 — 1 then it is also of 
order 2s. 


7. Show that all stable 4-step methods of order 6 are given by 

q(C) = (C 2 - 1)(C 2 + 2p( +1), \p\ < l, 

-(C) = “ aO(C 4 + 1) + ^(64 + 34/i)C(C 2 + 1) + — (8 + 38//)C 2 . 

Compute the error constant and observe that it cannot become arbitrarily small. 
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Result. C = —(16 — 5/x)/(7560(1 + //)). 

8. Prove the following bounds for the error constant: 

a) For stable methods of order k + 2 

C<-2^~ k » k+1 . 

b) For stable methods of order k + 1 with odd k we have 

C<- 2"V 

c) For stable explicit methods of order k we have (// • = 0 for even j) 

J = 1 

Show that all these bounds are optimal. 

Hint. Compare the formulas (3.18) and (3.21) and use the relation cr(l) = 
2 k ~ 1 a k _ 1 of Lemma 3.6. 

9. The coefficients ji- of formula (3.19) satisfy the recurrence relation 

^+1 + 1^-! +■■ = 4jT6- (3 ' 30) 

The first of these coefficients are given by 

_ 1 2 22 214 

Ml “6’ M3 “45’ _ 945’ 14175’ 


10. Another proof of Lemma 3.8: multiplying (3.30) 
from it the same formula with j replaced by j — 1 


(2j + 3)/i ; 


2j + l ' 


J-l 

' ^2z+l I 

i =0 


' 2j + 3 

. 2 j — 2i + 1 


by 2j + 3 and subtracting 
yields 


2j + l \ 

2 j — 2i — 1 / 


= 0 . 


Show that the expression in brackets is negative and deduce the result of 
Lemma 3.8 by a simple induction argument. 
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..., ist das Adams’sche Verfahren jedem andern bedeutend uberlegen. 
Wenn es gleichwohl nicht geniigend allgemein angewandt wird und, 
besonders in Deutschland, gegeniiber den von Runge, Heun und Kutta 
entwickelten Methoden zuriicktritt, so mag dies daran liegen, dass bisher 
eine brauchbare Untersuchung der Genauigkeit der Adams’schen Inte¬ 
gration gefehlt hat. Diese Llicke soil hier ausgefullt werden,... 

(R. v. Mises 1930) 


The convergence of Adams methods was investigated in the influential article of 
von Mises (1930), which was followed by an avalanche of papers improving the er¬ 
ror bounds and applying the ideas to other special multistep methods, e.g., Tollmien 
(1938), Fricke (1949), Weissinger (1950), Vietoris (1953). A general convergence 
proof for the method (2.1), however, was first given by Dahlquist (1956), who gave 
necessary and sufficient conditions for convergence. Great elegance was introduced 
in the proofs by the ideas of Butcher (1966), where multistep formulas are written 
as one-step formulas in a higher dimensional space. Furthermore, the resulting 
presentation can easily be extended to a more general class of integration methods 
(see Section III.8). 

We cannot expect reasonable convergence of numerical methods, if the differ¬ 
ential equation problem 

y' = f(x,y), y(x 0 )=y 0 (4.1) 

does not possess a unique solution. We therefore make the following assumptions, 
which were seen in Sections 1.7 and 1.9 to be natural for our purpose: 

/ is continuous on D = {(x, y) ; x G [x Q , x] : || y{x) — y\\ < b} (4.2a) 

where y(x) denotes the exact solution of (4.1) and b is some positive number. We 
further assume that / satisfies a Lipschitz condition, i.e., 

\\f(x,y)~ f(x,z)\\<L\\y-z\\ for (x, y), (x, z) 6 D. (4.2b) 

If we apply the multistep method (2.1) with step size h to the problem (4.1) we 
obtain a sequence {?/•}. For given x and h such that (x — x 0 )/h = n is an integer, 
we introduce the following notation for the numerical solution: 

Vh( x ) = y n if x-x 0 = nh. (4.3) 

Definition 4.1 (Convergence), i) The linear multistep method (2.1) is called con¬ 
vergent , if for all initial value problems (4.1) satisfying (4.2), 

y{x) - y h (x) ^ 0 for h -»• 0, x e [z 0 , x] 
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whenever the starting values satisfy 

y(x 0 + ih) — Vh(x 0 + ih) —> 0 for h —> 0, i = 0,1,..., k — 1. 

ii) Method (2.1) is convergent of order p, if to any problem (4.1) with / suffi¬ 
ciently differentiable, there exists a positive h 0 such that 

II y(x) - y h {x) || < Ch p for h<h 0 

whenever the starting values satisfy 

|| y(x 0 + ih) -y h (x 0 + ih) || < C 0 h p for h < h Q , i = 0,1,..., k . 

In this definition we clearly assume that a solution of (4.1) exists on [x Q , x ]. 

The aim of this section is to prove that stability together with consistency are 
necessary and sufficient for the convergence of a multistep method. This is ex¬ 
pressed in the famous slogan 

convergence = stability + consistency 

(compare also Lax & Richtmyer 1956). We begin with the study of necessary 
conditions for convergence. 

Theorem 4.2. If the multistep method (2.1) is convergent, then it is necessarily 

i) stable and 

ii) consistent (i.e. of order 1: £>(1) = 0, £>'(1) = cr(l)). 

Proof Application of the multistep method (2.1) to the differential equation y' — 0, 
2/(0) = 0 yields the difference equation (3.6). Suppose, by contradiction, that q(Cf) 
has a root with | | > 1, or a root £2 on the unit circle whose multiplicity 

exceeds 1. and nCJf are then divergent solutions of (3.6). Multiplying by 
Vh we achieve that the starting values converge to y Q = 0 for h —> 0. Since 
y h (x) = VhCjb and y h (x) = {x/Vh)C, 2 /h remain divergent for every fixed x , 
we have a contradiction to the assumption of convergence. The method (2.1) must 
therefore be stable. 

We next consider the initial value problem y f = 0, 2/(0) = 1 with exact solution 
y(pc) = 1. The corresponding difference equation is again that of (3.6), which, in 
the new notation, can be written as 

ot k y h {x + kh) + a k _ 1 y h (x + (fe-l)ft) +... + a 0 y h {x) = 0. 

Letting h —> 0, convergence immediately implies that q( 1) = 0. 

Finally we apply method (2.1) to the problem y' — 1, 2/(0) = 0. The exact 
solution is y(x) = x. Since we already know that p(l) = 0, it is easy to verify 
that a particular numerical solution is given by y n = nhK or y h (x) = xK where 
K = cr(l)/p'(l). By convergence, K = 1 is necessary. □ 
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Although the statement of Theorem 4.2 was derived from a consideration of 
almost trivial differential equations, it is remarkable that conditions (i) and (ii) turn 
out to be not only necessary but also sufficient for convergence. 


Formulation as One-Step Method 


We are now at the point where it is useful to rewrite a multistep method as a one- 
step method in a higher dimensional space (see Butcher 1966, Skeel 1976). For 
this let 'ip = ^(x i: t/-,..., y i+k _i, h ) be defined implicitly by 

k—1 k—1 

Vi+j) + 0kf( x i + kh, hip - ^2 a 'jVi+j ) ( 4 - 4 ) 

3 =0 3=0 

where a'- = a-/a k and /?'• = (5-/a k . Multistep formula (2.1) can then be written 
as 

k-l 

y i+k = -J2 a ’jyi+j+ h ^- ^ 4 - 5 ) 

3=0 


Introducing the m • k -dimensional vectors (m is the dimension of the differential 
equation) 


Y i = (y i+k _ 1 ,y i+k _ 2 ,...,y i ) T , i> 0 


(4.6) 


and 



f~<-l 

~ a 'k-2 ■ 

• • -<*o \ 


/!\ 


1 

0 

. . 0 


0 

A = 


1 

0 

7 e l = 

0 


V 


1 0 j 


KoJ 


(4.7) 


the multistep method (4.5) can be written — after adding some trivial identities — 
in compact form as 

Y i+ i = (A <S> I)Y i + h^(x i , h), i> 0 (4.8) 

with 

${Xi, Y i ,h) = (ej ® Y i , h). (4.8a) 


Here, A® I denotes the Kronecker tensor product, i.e. the m • k -dimensional block 
matrix with (m, m) -blocks a -I . Readers unfamiliar with the notation and proper¬ 
ties of this product may assume for simplicity that (4.1) is a scalar equation (m = 1) 
and A® I = A. 

The following lemmas express the concepts of order and stability in this new 
notation. 
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Lemma 4.3. Let y(x) be the exact solution of (4.1). For i = 0,1, 2,... we define 
the vector Y i+1 as the numerical solution of one step 

Y i+l = {A®I)Y{x i ) + h<f>( Xi ,Y( Xi ), h) 
with correct starting values 

Y ( x i) = (v( x i+k- 1 )> v( x i+k - 2 )> y( x i)) T ■ 

i) If the multistep method (2.1) is of order 1 and if f satisfies (4.2), then an 
h 0 > 0 exists such that for h <h 0 , 

\\Y(x i+1 ) - Y i+1 1| < huj(h), 0<i< x/h-k 
where uj(h) —> 0 for h —» 0. 

ii) If the multistep method (2.1) is of order p and if f is sufficiently differen¬ 
tiable then a constant M exists such that for h small enough, 

nn* i+1 ) - y i+ i ii < mii*- 1 , o <i< x/h-k. 

Proof. The first component of Y (pc i+1 ) — Y i+1 is the local error as given by Defi¬ 
nition 2.1. Since the remaining components all vanish, Exercise 5 of Section III.2 
and Definition 2.3 yield the result. □ 


Lemma 4.4. Suppose that the multistep method (2.1) is stable. Then there exists a 
vector norm (on such that the matrix A of (4.7) satisfies 

|| A ®/||<1 

in the subordinate matrix norm. 

Proof. If A is a root of g(Q , then the vector A^ -2 ,..., 1) is an eigenvector 

of the matrix A with eigenvalue A. Therefore the eigenvalues of A (which are 
the roots of g(Q ) satisfy the root condition by Definition 3.2. A transformation to 
Jordan canonical form therefore yields (see Section 1.12) 




( £ i +i 

\ 

\ 

T 1 AT — J — diag < 

1 A i ,.... A/ , 


e k-l 



l 

V 

K ) 



where A x ,..., A z are the eigenvalues of modulus 1, which must be simple, each e- 
is either 0 or 1. We further find by a suitable multiplication of the columns of T 
that \sj \ < 1 — \Xj\ for j = l + 1,,.., k — 1. Because of (9.11’) of Chapter I we 
then have \\J ® IW^ < 1. Using the transformation T of (4.9) we define the norm 
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This yields 

II (A ® I)x\\ = || (T- 1 ® I)(A ® = || (J ® ^(T- 1 ® /^IL 

<||(T- 1 ®/)x|| 00 = ||x|| 


and hence also \\A ® 1 || < 1. 


□ 


Proof of Convergence 


The convergence theorem for multistep methods can now be established. 

Theorem 4.5. If the multistep method (2.1) is stable and of order 1 then it is 
convergent. If method (2.1) is stable and of order p then it is convergent of order 

p. 

Proof. As in the convergence theorem for one-step methods (Section II.3) we may 
assume without loss of generality that f(x,y) is defined for ah y E R m , x E [x 0 , x\ 
and satisfies there a (global) Lipschitz condition. This implies that for sufficiently 
small h the functions f>(x^ Y il h) and $(£•, Y il h ) satisfy a Lipschitz condition 
with respect to the second argument (with Lipschitz constant L *). For the function 
G, defined by formula (4.8), which maps the vector Y i onto Y i+1 we thus obtain 
from Lemma 4.4 

IK-O',) - G{Z l )II < (1-///.' ) Y, -Z t \\. (4.10) 

The rest of the proof now proceeds in the same way as for one-step methods and is 
illustrated in Fig. 4.1. 



The arrows in Fig. 4.1 indicate the application of G. From Lemma 4.3 we 
know that || Y(x i+1 ) —G(Y(xfj) || < huofh). This together with (4.10) shows that 
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the local error Y{pc i+1 ) — G(Y(x i )) at stage i + 1 causes an error at stage n, 
which is at most huj{h){ 1 + hL*) n ~ i+1 . Thus we have 

\\Y(x n )-Yj < \\Y(x 0 )-Y 0 \\(l + hL*) n 

+ huj(h) ^(1 + AL*) n 1 + (1 -\-hL*) n 2 + ... + 1^ (4 11) 

< \\Y(x 0 )-Y 0 \\ exp (nhL*) + (exp(n/iL*) - l). 

Convergence of method (2.1) is now an immediate consequence of formula (4.11). 
If the multistep method is of order p , the same proof with c o{h) replaced by MhP 
yields convergence of order p . □ 


Exercises 


for y < 0, 
for 0 < y < x 2 , 
for y > x 2 . 

a) Show that y{pc) = x 2 /3 is the unique solution of y' = f(x,y), y( 0) = 0, 
although / does not satisfy a Lipschitz condition near the origin. 

b) Apply the mid-point rule (1.13’) with starting values y 0 = 0, y 1 = —h 2 
to the above problem and verify that the numerical solution at x = nh is 
given by y h {x) = (— l) n x 2 (Taubert 1976, see also Grigorieff 1977). 


1. Consider the function (for x > 0) 

^ 2x 


2x- 

-2x 


4 y 

X 


2. Another motivation for the meaning of the error constant: suppose that 1 is 
the only eigenvalue of A in (4.7) of modulus one. Show that (1,1,..., 1) T is 
the right eigenvector and (l,l + a^_ 1 ,l + a^._ 1 +a^_ 2 ,...) is the left eigen¬ 
vector to this eigenvalue. The global contribution of the local error after many 
steps is then given by 



( C P+1\ 


(A 

A°° 

0 

= c 

i 


V o J 


\i/ 


(4.12) 


Multiply this equation from the left by the left eigenvector to show with (2.6) 
that C is the error constant defined in (2.13). 

Remark. For multistep methods with several eigenvalues of modulus 1, formula 
(4.12) remains valid if A°° is replaced by E (see Section III.8). 
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Des war a harter Brockn, des ... (Tyrolean dialect) 


It is clear from the considerations of Section II.4 that an efficient integrator must be 
able to change the step size. However, changing the step size with multistep meth¬ 
ods is difficult since the formulas of the preceding sections require the numerical 
approximations at equidistant points. There are in principle two possibilities for 
overcoming this difficulty: 

i) use polynomial interpolation to reproduce the starting values at the new (equi¬ 
distant) grid; 

ii) construct methods which are adjusted to variable grid points. 

This section is devoted to the second approach. We investigate consistency, stability 
and convergence. The actual implementation (order and step size strategies) will 
be considered in Section III.7. 


Variable Step Size Adams Methods 


F. Ceschino ( 1961 ) was apparently the first person to propose a “smooth” transition 
from a step size h to a new step size ujh . C.V.D. Forrington ( 1961 ) and later on F.T. 
Krogh ( 1969 ) extended his ideas: we consider an arbitrary grid ( x n ) and denote 
the step sizes by h n = x n+1 — x n . We assume that approximations y- to y{x-) 
are known for j g n — k + 1,.. ^ n and we put f ■ — /(x^y^) . In the same way 
as in Section III. 1 we denote by p(t) the polynomial which interpolates the values 
(Xjifj) for j = n — k + 1,..., n. Using Newton’s interpolation formula we have 

fc-ij-i 

P(t) = XTU “ X n-i ) ^/K, X n - 1 , • • • , X n -j\ (5- 1 ) 

j= 0 i =0 


where the divided differences f[x n ,..., x n _-] are defined recursively by 

5 °f[ x n] = fn 

cF/K,..., x n _ } ) = - 3 —— --• 

^n—j 
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For actual computations (see Krogh 1969) it is practical to rewrite (5.1) as 


^i 1 t — r 

m =EII 


J=0 i=0 X n+l X n—i 


where 


j~ 1 


$ j = II ( X n +1 - X„-i ) • V f K. • • • > x n-j] ■ 


) dt. 


i =0 

We now define the approximation to y(x n+1 ) by 

rx n+1 

y n +1 =v n + p(t ) 1 

J x n 

Inserting formula (5.1’) into (5.4) we obtain 

fc-i 

Vn+l =y n + h nYl 9j («) («) 

i=o 

with 


i rx n -\- 1 

9 ^ n)= v n 

n n J x n ■—n 


t — Xr 


—— q ^n+1 X n—i 


■ dt. 


(5.1’) 

(5.3) 

(5.4) 

(5.5) 

(5.6) 


Formula (5.5) is the extension of the explicit Adams method (1.5) to variable step 
sizes. Observe that for constant step sizes the above expressions reduce to (Exer¬ 
cise 1) 

*» = V'7 n . 

The variable step size implicit Adams methods can be deduced similarly. In anal¬ 
ogy to Section III. 1 we let p*(t) be the polynomial of degree k that interpolates 
(xj , fj) for j = n — k + 1,... 3 n, n + 1 (the value /„+! = /( X n-\- 1 ’ 0„+l) COn - 
tains the unknown solution y n+1 ). Again, using Newton’s interpolation formula 
we obtain 

fc-i 

p*(t) =p(t) + Y[{t-x n _ i )-S k f[x n+1 ,x n ,...,x n _ k+1 \. 

i =0 

The numerical solution, defined by 

rx n+1 

y n +i = yn+ p*(t)dt, 

Jx n 

is now given by 

Vn+l =Pn+l+ h n9k( n )®k( n + 1 )> ( 5 - 7 ) 

where p n+1 is the numerical approximation obtained by the explicit Adams 
method 

k -1 

Pn+i =y n +h n '22g j (n)*5(n) 

3=0 
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and where 


k-1 


$ fe( n +!)=n< 


X n-\-l X n—i 


)-5 k f[; 


n+1’ ' L n’> ' • • 5 x n—k+ lJ* 


(5.8) 


*=o 


Recurrence Relations for gj(ri), i>j(n) and <l>|(n) 

The cost of computing integration coefficients is the biggest dis¬ 
advantage to permitting arbitrary variations in the step size. 

(F.T. Krogh 1973) 


The values T>* (n) (j =i§0,..., k — 1) and <& k (n + 1) can be computed efficiently 
with the recurrence relations 


* 0 (n) = <i>*(n)=/„ 

& j+1 (n) = $ j (n)-$* j (n-l) (5.9) 

®j( n ) 


which are an immediate consequence of Definitions (5.3) and (5.8). The coeffi¬ 
cients 


w=n 


i =0 


X n+l X n—i 
X n ~ X n-i -1 


can be calculated by 


A)( n ) = 1 > 0j( n ) = Pj-i ( n ) 


X n+1 X n-j + l 
X n~ X n-j 


The calculation of the coefficients (n) is trickier (F.T. Krogh 1974). We intro¬ 
duce the q -fold integral 


c„(x) = 


-iM r x r^-i 


(9-1) 


rr-j n 


«i J- 1 e _ r 
- so 


hn Jx n j x n 


and observe that 


_0 X n -\-1 X n—i 


d£o-■■<%„-! (5.10) 


5 i (n) = C J l( a; n+l)- 


Lemma 5.1. We have 

C 0q( X n+l ) = C lg( X n+l) = ^ ^ ’ 

S g K+l) =S-l.?( a; n+l)-S-1.9+l( :E «+l)--3T- 

x n+1 x n-j+l 
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Proof. The first two relations follow immediately from (5.10). In order to prove the 
recurrence relation we denote by d(x) the difference 


d(x) = c jq (x) - Cj_ l q {x) 


b n-j +1 


X n +1 X n-j +1 


• + c 7 _, . +1 (®) 


h r 


X n +1 X n-j +1 

Clearly, (x n ) = 0 for i = 0,1,..., q — 1. Moreover, the q -th derivative of d{x) 
vanishes, since by the Leibniz rule 

d* ( , , X n-j-\-l 


dxv 


(s-i, ? w • 


^n+l ^n-j+l 


= cj- 1>9 (x) 


c j\q (‘ C ) c i'-l,g+l ( x ) 


g ~ x n-j+l 
X n +1 — -'‘/i ./ I 1 
,(«) 


+ qc < ?_V(x) 


X n +1 X n-j + l 


X n+1 X n-j + l 


Therefore we have d(x) = 0 and the statement follows by putting x = 


J n+1 ' 


Using the above recurrence relation one can successively compute 
c 2 g( x n+i) for q = l,...,k-l\ c 3q (x n+1 ) for q = 1,... ,k — 2; c kq (x n+1 ) 
for q — 1. This procedure yields in an efficient way the coefficients g- (n) = 
c ji( x n+i ) of the Adams methods. 

Variable Step Size BDF 

The BDF-formulas (1.22) can also be extended in a natural way to variable step 
size. Denote by q(t) the polynomial of degree k that interpolates (x i ,y i ) for 
i = n + 1, n, ..., n — fc + 1. It can be expressed, using divided differences, by 

k j- 1 

•lit) = ID “ x n+l-i) • S j y[x n+ 1 , * n ,..., * n _ i+1 ]. (5.11) 

j=0z=0 

The requirement 

9 , (*n+l) = /( a: n+ 1 »l/n+ 1 ) 

immediately leads to the variable step size BDF-formulas 

k j- 1 

E^n (^n+l ^n+1—z) ' n+1 ’ ’ ’ ’ ’ j + ll y«+i)- 

j=i »=i 

(5.12) 

The computation of the coefficients is much easier here than for the Adams meth¬ 
ods. 
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General Variable Step Size Methods and Their Orders 


For theoretical investigations it is convenient to write the methods in a form where 
the y- and f- values appear linearly. For example, the implicit Adams method 
(5.7) becomes (k = 2) 

2/n+l = + 6(1 A; ^(( 3 + 2w n)/n+1 + ( 3 + W n)( 1 + W n)/n“ W n/n-l)i 

(5.13) 

where we have introduced the notation uo n = h n /h n _ 1 for the step size ratio. Or, 
the 2-step BDF-formula (5.12) can be written as 


2/n+l 


(i + ^) 2 

l + 2w n 


Vn + 


1 + 2a; 


-y = h ^ n f 

Vn ~\ n "n 1 j r2uj J n+ 1 


(5.14) 


In order to give a unified theory for all these variable step size multistep methods 
we consider formulas of the form 


k — 1 k 

y n +k ^ ^ ^ juVn+j ^n+Zc — 1 ^ ^ fijnf n+j' (5.15) 

j=o i=o 


The coefficients a^ n and /?- n actually depend on the ratios uo i = h i /h i _ 1 for i = 
n + l,...,7i + fc — 1. In analogy to the constant step size case we give 


Definition 5.2. Method (5.15) is consistent of order p , if 

k —1 k 

y{x n -\-k) ^ ^ ^n+Zc —1 ^ ^ fijnQ (^n+j) 

j=0 j=0 

holds for all polynomials g(x) of degree < p and for all grids {x -). 

By definition, the explicit Adams method (5.5) is of order k , the implicit 
Adams method (5.7) is of order k + 1, and the BDF-formula (5.12) is of order k. 

The notion of consistency certainly has to be related to the local error. Indeed, 
if the method is of order p , if the ratios h- / h n are bounded for j — n + 1,..., n + 
k — 1 and if the coefficients satisfy 

a jn , f3 jn are bounded , (5.16) 

then a Taylor expansion argument implies that 

k— 1 k 

V(. x n+k) + J2 a jnV( ^n+j ) ^n+Zc —1 E ftjny (^n+j ) 

J=0 j=0 

for sufficiently smooth t/(x). Interpreting y(x) as the solution of the differential 
equation, a trivial extension of Lemma 2.2 to variable step sizes shows that the 
local error at x n+k (cf. Definition 2.1) is also 0(hn +1 ) . 
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This motivates the investigation of condition (5.16). The methods (5.13) and 
(5.14) are seen to satisfy (5.16) whenever the step size ratio h n /h n _ 1 is bounded 
from above. In general we have 

Lemma 5.3. For the explicit and implicit Adams methods as well as for the BDF- 
formulas the coefficients a- n and f3j n are bounded whenever for some ft 

hn/hn -1 — 

Proof We prove the statement for the explicit Adams methods only. The proof 
for the other methods is similar and thus omitted. We see from formula (5.5) that 
the coefficients a- n do not depend on n and hence are bounded. The fi- n are 
composed of products of g- (n) with the coefficients of <b* (n), when written as a 
linear combination of / n ,..., f n _- . From formula (5.6) we see that | g- (n) \ < 1. It 
follows from (x n+1 — x n _- +1 ) < max(l, Fli){x n — x n _-) and from an induction 
argument that the coefficients of <F*(n) are also bounded. Hence the /3j n are 
bounded, which proves the lemma. □ 


The condition h n /h n _ 1 < ft is a reasonable assumption which can easily be 
satisfied by a code. 


Stability 


So geht das einfach ... (R.D. Grigorieff, Halle 1983) 

The study of stability for variable step size methods was begun in the articles of 
Gear & Tu (1974) and Gear & Watanabe (1974). Further investigations are due to 
Grigorieff (1983) and Crouzeix & Lisbona (1984). 

We have seen in Section III.3 that for equidistant grids stability is equivalent to 
the boundedness of the numerical solution, when applied to the scalar differential 
equation y' = 0. Let us do the same here for the general case. Method (5.15), 
applied to y' = 0, gives the difference equation with variable coefficients 

k-i 

hn+k ^ ^ ^ juVn+j 
3=0 

If we introduce the vector Y n = (y n+k _ 1: ..., y n ) T , this difference equation is 
equivalent to 

Y n+ l=A n Yn 
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with 

/ ~ a k-l,n . ~ a l,n ~ a 0,n\ 

1 0 ... 0 0 


i 1 ; s; 

the companion matrix. 


Definition 5.4. Method (5.15) is called stable, if 

\\A n+l A n+l _ 1 ...A n+1 AJ<M 

for all n and l > 0. 


(5.18) 


(5.19) 


Observe that in general A n depends on the step ratios cj n+1 ,..., co n+k _ 1 . 
Therefore, condition (5.19) will usually lead to a restriction on these values. For 
the Adams methods (5.5) and (5.7) the coefficients a JTL do not depend on n and 
hence are stable for any step size sequence. 

In the following three theorems we present stability results for general variable 
step size methods. The first one, taken from Crouzeix & Lisbona (1984), is a sort 
of perturbation result: the variable step size method is considered as a perturbation 
of a strongly stable fixed step size method. 

Theorem 5.5. Let the method (5.15) satisfy the following properties: 

k -1 

a) it is of order p > 0, i.e., 1 + ol- u = 0; 

3 =o 

b) the coefficients a- n = a-(u J n+1 ,..., (J n+k _ 1 ) are continuous in a neighbour¬ 
hood of (1,..., 1); 

c) the underlying constant step size formula is strongly stable, i.e., all roots of 

k -1 

c k +y2^(i,..-,i)c j = o 

3=0 

lie in the open unit disc |£| < 1, with the exception of ( x = 1. 

Then there exist real numbers (uj < 1 < Ll) such that the method is stable if 

oj < h n /h n _ 1 < for all n. (5.20) 


Proof. Let A be the companion matrix of the constant step size formula. As in the 
proof of Lemma 4.4 we transform A to Jordan canonical form and obtain 

( . 0 

T~ 1 AT= A o 
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where, by assumption (c), WA^ < 1. Observe that the last column of T, the 
eigenvector of A corresponding to 1, is given by t k = (1,..., 1) T . Assumption 
(a) implies that this vector t k is also an eigenvector for each A n . Therefore we 
have 

/ _ ° 

T~ 1 A n T= I A n o 

and, by continuity, < 1, if o; n+1 ,... ,c^ n+A ._ 1 are sufficiently close to 1. 

Stability now follows from the fact that 

\\ T ~ lA n T \\i = max(||l n ||i, 1) = 1, 

which implies that 

\\A n+l ... A n+1 Aj < ||T||-||T _1 ||. □ 


The next result (Grigorieff 1983) is based on a reduction of the dimension of 


the matrices A n 

by 

one. The idea is to use the transformation 


f 1 

1 

1 .. IN 


/! 

“I 0 \ 


1 

1 .. 1 



1 -1 

T = 



1 .. 1 

, 7 1 « 


1 •. 



0 




0 •. -1 


V 


j 


V 

1 / 


Observe that the last column of T is just t k of the above proof. A simple calcula¬ 
tion shows that 

. {At (T 

T~ 1 A„T = 


a T 
'k -1 


where e^_ 1 = (0,..., 0,1) and 


with 


f* a *k- 2 ,n ~ a t- 

1 0 

At = I 1 


V 


a *k-2,n — 1 + a fc-l,n> 


^k—j—l,n ^ k—j,n ^k—j,n 


~ a ln 


On 


0 
0 

0 J 


^0n 


On’ 


for j = 2,..., k - 1. 


(5.21) 


We remark that the coefficients a:* n are just the coefficients of the polynomial 


defined by 

(C* + «fc-l,nC fe_1 + • • • + a l,nC + a 0,n) 

= (C — l)(C fe 1 + a t-2,n( k 2 + • • • + a l, n ( + a 0,n)- 
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Theorem 5.6. Let the method (5.15) be of order p > 0. Then the method is stable 
if and only if for all n and l > 0, 

a) K +1 .-•<+,<!!< Ml 

w ll«LiE"int;A-n<M 2 . 



Since in this theorem the dimension of the matrices under consideration is re¬ 
duced by one, it is especially useful for the stability investigation of two-step meth¬ 
ods. 

Example. Consider the two-step BDF-method (5.14). Here 

W n +1 „ _ 1 

^0 n 1 _|_ 2cc; 5 ®ln ^0 n' 

The matrix (5.21) becomes in this case 

4* = (:„• 1 _«* = ^+1 

A-n I ^0 n)i ^0 n - 1.9 

1 + Zu; n+1 

If l^onl < Q < 1 the conditions of Theorem 5.6 are satisfied and imply stability. 
This is the case, if 

0 < ^n+l/^n < ^ < 1 + V 2 . 

An interesting consequence of the theorem above is the instability of the two-step 
BDF-formula if the step sizes increase at least like h n+1 /h n > 1 + a/2. 

The investigation of stability for k- step (k > 3) methods becomes much more 
difficult, because several step size ratios o; n+1 , cJ n+2 , • • • are involved. Grigori- 
eff (1983) calculated the bounds (5.20) given in Table 5.1 for the higher order 
BDF-methods which ensure stability. These bounds are surely unrealistic, since all 
pathological step size variations are admitted. 

A less pessimistic result is obtained if the step sizes are supposed to vary 
more smoothly (Gear & Tu 1974): the local error is known to be of the form 
d(x n )hn^ 1 + 0(hn +2 ) , where d(x) is the principal error function. This local error 
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Table 5.1. Bounds (5.20) for k -step BDF formulas 


k 

2 

3 

4 

5 

UJ 

0 

0.836 

0.979 

0.997 

n 

2.414 

1.127 

1.019 

1.003 


is, by the step size control, kept equal to Tol. Hence, if d(x) is bounded away from 
zero we have 

h n = \Tol/d{x n )\VW+0{h n ) 
which implies (if h n+1 /h n < Q) that 

K+l/K = \ d ( x n)/ d ( x n+l)\ 1/{P+1) + °(h n ). 

If d(x) is differentiable, we obtain 

\h n+1 /h n -l\<Ch n . (5.22) 

Several stability results of Gear & Tu are based on this hypothesis (“Consequently, 
we can expect either method to be stable if the fixed step method is stable. ...”). 
Adding up (5.22) we obtain 

n+Z 

Y J \h j+1 /h j -l\<C{x-x 0 ), 

j=n 

a condition which contains only step size ratios. This motivates the following the¬ 
orem: 

Theorem 5.7. Let the coefficients a- n of method (5.15) be continuously differen¬ 
tiable functions of u n+1 ,..., u n+k _ 1 in a neighbourhood of the set 

{(to n+1 ,...,u n+k _ i) ; u<Wj <fl} 

and assume that the method is stable for constant step sizes (i.e., for u- = 1). Then 
the condition 

n+Z 

| hj +1 /hj — 1| < C for all n and l > 0, (5.23) 

j—n 

together with uj < h-+ x /h- < ft, imply the stability condition (5.19). 

Proof. As in the proof of Theorem 5.5 we denote by A the companion matrix 
of the constant step size formula and by T a suitable transformation such that 
||T -1 AT|| = 1. The mean value theorem, applied to ay(cu n+1 , ... ,cj n+fc _ 1 ) — 
— , 1), implies that 

n+Zc — 1 

\\T- l A n T-T~ l AT\\<K £ 

j=n+1 
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Hence 


T~ X AT II < 1 + K 


n+k — 1 

E 


b-i| 


< 


exp 


n-\-k —1 

■(* £ 


|w„- -1| 


From this inequality we deduce that 

II K+i • • • < Ill’ll • ||T" 1 || • exp (K ■ (k - 1 )C). 


□ 


Convergence 


Convergence for variable step size Adams methods was first studied by 
Piotrowski (1969). In order to prove convergence for the general case we intro¬ 
duce the vector Y n # (t/ n+A ._ 1 ,..., y n+1 ,y n ) T . In analogy to (4.8) the method 
(5.15) then becomes equivalent to 

Y n +1 = ( A n ® T ) Y n + h n+k-l®n( X rO Y m K) ( 5 - 24 ) 

where A n is given by (5.18) and 

i( x n’ Y n’ h n) = ( e l ® ^n( X m Y „, K)■ 

The value 4/ = \k n (x n , Y n ,h n ) is defined implicitly by 

k -1 k -1 

^ ^ v fijnf (^n+j ’ ^n+j) @knf i^n+k") ^ ^ ^jn2/n+j j * 

j=0 j=0 

Let us further denote by 

^OJ = (y(a;„+fe-l), • • •, 2/0„+i), y(x n )) T 

the exact values to be approximated by Y n . The convergence theorem can now be 
formulated as follows: 

Theorem 5.8. Assume that 

a) the method (5.15) is stable, of order p, and has bounded coefficients a- and 

/V 

b) the starting values satisfy \\Y(x 0 ) — Y 0 1| = 0(h^); 

c) the step size ratios are bounded (h n /h n _ 1 < Q ). 

Then the method is convergent of order p, i.e., for each differential equation y' = 
f(pc, y), y(pc 0 ) = y 0 with f sufficiently differentiable the global error satisfies 

\\y{x n )-y n \\<Ch p for x n <x, 

where h — max h- . 
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Proof. Since the method is of order p and the coefficients and step size ratios are 
bounded, formula (5.17) shows that the local error 

<Wi = Y(x n+1 ) - (A n <g> I)Y (; x n ) - h n+k _ 1 ^ n (x n , Y(x n ), h n ) (5.25) 

satisfies 

<Wi =0(hl +1 ). (5.26) 

Subtracting (5.24) from (5.25) we obtain 

Y(x n+1 ) - Y n+1 = (A n 0 1)(Y(x n ) - Y n ) 

+ h n+k _ i ($ n (x„, y (x„), /ij - $ n (x n , y„, &„)) + s n+1 

and by induction it follows that 

Y(x n+ 1 ) - r n+1 = ((A„ ... A 0 ) 0 1) (Y(x 0 ) - y 0 ) 

n 

+ XI hj+k-l (( A n ' ' ' ^i+l) ® P ^ (^')> h j) ~ h j )) 

j=0 

n 

+ y^((^n * * *^j+l) ® ^Mj+1* 

J=0 

As in the proof of Theorem 4.5 we deduce that the <h n satisfy a uniform Lipschitz 
condition with respect to Y n . This, together with stability and (5.26), implies that 

n 

\\Y(x n+1 )-Y n+1 \\<J2 hj+k-i LWY^-YjW+C^. 

j =0 

In order to solve this inequality we introduce the sequence {e n } defined by 

n 

£o = ||y(x 0 )-y 0 ||, £„ +1 = XWi^ + c i /lP - (5.27) 

3=0 

A simple induction argument shows that 

\\Y(x n )-Y n \\<e n . (5.28) 

From (5.27) we obtain for n > 1 

£ n+l = £ n + K+k-l Le n < e MK+k-l L ) £ n 

so that also 

£ n <exp((x-x 0 )L)e 1 = exp((x - x 0 )L) ■ (h^LWYix^-Yj+C^). 

This inequality together with (5.28) completes the proof of Theorem 5.8. □ 
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Exercises 

1. Prove that for constant step sizes the expressions (n) and <b*(n) (formulas 
(5.3) and (5.6)) reduce to 

where 7 ^ is given by ( 1 . 6 ). 

2. (Grigorieff 1983). For the A;-step BDF-methods consider grids with constant 
mesh ratio uj , i.e., h n = ooh n _ 1 for all n . In this case the elements of A* (see 
(5.21)) are independent of n. Show numerically that all eigenvalues of A * are 
of absolute value less than one for 0 < uj < R k where 


k 

2 

3 

4 

5 

6 

Rk 

2.414 

1.618 

1.280 

1.127 

1.044 



III.6 Nordsieck Methods 


While [the method] is primarily designed to optimize the effi¬ 
ciency of large-scale calculations on automatic computers, its es¬ 
sential procedures also lend themselves well to hand computation. 

(A. Nordsieck 1962) 

Two further problems must be dealt with in order to implement the 
automatic choice and revision of the elementary interval, namely, 
choosing which quantities to remember in such a way that the 
interval may be changed rapidly and conveniently ... 

(A. Nordsieck 1962) 


In an important paper Nordsieck (1962) considered a class of methods for ordi¬ 
nary differential equations which allow a convenient way of changing the step size 
(see Section III.7). He already remarked that his methods are equivalent to the im¬ 
plicit Adams methods, in a certain sense. Let us begin with his derivation of these 
methods and then investigate their relation to linear multistep methods. 

Nordsieck (1962) remarked .. that all methods of numerical integration are 
equivalent to finding an approximating polynomial for y(x) ..His idea was to 
represent such a polynomial by the Oth to fcth derivatives, i.e., by a vector (“the 
Nordsieck vector”) 



The yffl are meant to be approximations to y^\x n ), where y(x) is the exact 
solution of the differential equation 

y' = f(x,y). (6.2) 


In order to define the integration procedure we have to give a rule for determining 
z n+ 1 when z n and the differential equation (6.2) are given. By Taylor’s expansion, 
such a rule is (e.g., for k = 3) 

Vn+l = Vn + %n + fr y'n + fi [Vn + IT e 

hyh + 2 fr y'n + 3 fr y'n + 4 fr e 

Wy'n + 3 ^r Vn + 6 ir e 

fr y'n + 4 fr e, 

where the value e is determined in such a way that 


h y'n +1 = 

ZrL// 

2 ! %+1 — 

ALz" — 

3 ! %+1 — 


y'n+l= f(Xn+l,y n +l)- 

Inserting (6.4) into the second relation of (6.3) yields 

4 IT e = h {f( X n+l’ Vn +X> - 


(6.4) 

(6.5) 
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with 


hf P n = 


hy'n + 2 


h 2 „ „/i 3 

2\ Vn + 3 lsi v 


III 
n ' 


With this relation for e the above method becomes 

y n+1 = y n + hy' n + + %y’X + \h(f(x n+1 ,y n+1 ) -//) 

^C+l = h Vn + 2 |r»n + 3 ll< + ^(/(*„+!, »„+l) “ /n) 

|r2/"+i = |r2/n + 3 |r< + f (//0*Wi, J/«+i) “ /«) 

|r2/"+l= W^n + ^(/(*n+l.yn+l)-/n) 


The first equation constitutes an implicit formula for t/ n+1 , the others are explicit. 

Observe that for sufficiently accurate approximations to y^\x n ) the value e 
(formula (6.5)) is an approximation to y ( 4 ) (x n ). This seems to be a desirable prop¬ 
erty from the point of view of accuracy. Unfortunately, method (6.6) is unstable. 
To see this, we put f(x, y) = 0 in (6.6). In this case the method becomes the linear 
transformation 


where 


'■'n+l 


= Mz„ 


M = 


(1 1 1 1\ 
0 12 3 
0 0 13 
VO 0 0 1 / 


/V 4 \ 

v 3 ! 2 / 


(0 1 2 3). 


(6.7) 


The eigenvalues of M are seen to be 1,0, —(2 + \/3) and —1/(2 + \/3), implying 
that (6.6) is unstable and therefore of no use. The phenomenon that highly accurate 
methods are often unstable is, after our experiences in Section III.3, no longer 
astonishing. 

To overcome this difficulty Nordsieck proposed to replace the constants 1/4, 
1, 3/2, 1 which appear in front of the brackets in (6.6) by arbitrary values (Z 0 , l l9 
Z 2 , Z 3 ), and to use this extra freedom to achieve stability. In compact form this 
modification can be written as 

z n +1 = CP® I)z n + (l® I)(hf(x n+1 ,y n+1 ) - (eJP®I)z n ). (6.8) 
Here z n is given by (6.1), P is the Pascal triangle matrix defined by 



l = (l 0 , l k ) T and e 1 = (0,1,0, 

and matrices start from zero. 


for 0 < i < j < k, 
else, 

., 0) T . Observe that the indices of vectors 
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For notational simplicity in the following theorems, we consider from now on 
scalar differential equations only, so that method (6.8) becomes 

Z n+l =Pz n +l(hf n+1 -efPz n ). (6.8’) 

All results, of course, remain valid for systems of equations. Condition (6.4), which 
relates the method to the differential equation, fixes the value of l x as 

h = I- (6.9) 

The above stability analysis applied to the general method (6.8) leads to the differ¬ 
ence equation (6.7) with 

M = P-le[P. (6.10) 


For instance, for k = 3 this matrix is given by 


M = 


/I 1 -l 0 1-2 1 0 1 

0 0 0 

0 -l 2 1 - 2 1 2 3 

\0 - l 3 - 2/ 3 1 


3( 0 \ 

0 

- 3 1 2 

— 3/g / 


One observes that 1 and 0 are two eigenvalues of M and that its characteristic 
polynomial is independent of l 0 . Nordsieck determined l 2 ,..., l k in such a way 
that the remaining eigenvalues of M are zero. For k = 3 this yields l 2 = 3/4 
and Z 3 = 1/6. The coefficient / 0 can be chosen such that the error constant of the 
method (see Theorem 6.2 below) vanishes. In our situation one gets l 0 = 3/8, so 
that the resulting method is given by 


M3/8, 1, 3/4, 1/6) T 


It is interesting to note that this method is equivalent to the implicit 3 -step Adams 
method. Indeed, an elimination of the terms (/i 3 /3!)t/"' and (/i 2 /2!)?/" by using 
formula (6.8) with reduced indices leads to (cf. formula (1.9”)) 

y n +l=yn + ^(vy'n+l + i yy'n-5y'n-l+y'n-^)- ( 6 - 11 ) 


Equivalence with Multistep Methods 


More insight into the connection between Nordsieck methods and multi step meth¬ 
ods is due to Descloux (1963), Osborne (1966), and Skeel (1979). The following 
two theorems show that every Nordsieck method is equivalent to a multistep for¬ 
mula and that the order of this method is at least k . 
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Theorem 6.1. Consider the Nordsieck method (6.8) where l x = 1. The first two 
components of z n then satisfy the linear multistep formula (for n > 0 ) 

k k 

^ v ^ihn+i ^ ^ ^ n-\-i 

i= 0 i— 0 

where the generating polynomials are given by 

0 (C) = det(C/-P)-ef(C/-P)- 1 ; 
a(0 = det((I-P)-eT((I-P)- 1 l. 

Proof The proof of the original papers simplifies considerably, if we work with the 
generating functions (discrete Laplace transformation) 

no = 5>„c", r(o = 5>»c", p(o = E/nC n , ■■■■ 

n>0 n>0 n>0 

Multiplying formula (6.8’) by £ n+1 and adding up we obtain 

Z(C) = CPZ(C) + 1 (hF( 0 - ef PC^(O) + (*0 - Ihfo). (6.14) 

Similarly, the linear multistep method (6.12) can be written as 

e(C)^(C) = ha(()F(() \ Pk ;l (Q. (6.15) 

where 

p(C) = C fc £?(i/C), 5(C) = CMVC) (6-16) 

and p k _ 1 is a polynomial of degree fc—1 depending on the starting values. In order 
to prove the theorem we have to show that the first two components of Z(Q satisfy 
a relation of the form (6.15). We first rewrite equation (6.14) in the form 

Z(C) = (I- CP) -1 /(/iF(C) - efPCZ(O) + (/ - (P)~\z 0 ~ Wo) 

so that its first two components become 

no = <£(/- C P)~ l l(hF(0 - elP(Z (()) + ej(7 - CP)” 1 ^ “ ^/o) 

(iF(C) = ef (/ - CP)” 1 ^ (^P(C) - elP(Z(C)) +el(I- (P)~\z 0 - lhf 0 ). 

Eliminating the term in brackets and multiplying by det(/ — (P) we arrive at for¬ 
mula (6.15) with 

^(C)=det(/-CP)-ef(J-CP)- 1 ( 
a(C)=det(/-CP)-e^(/-CP)- 1 ( 

Pk—i (0 = det(7 - CP) (ef (7 - (PyHel (7 - CP) -1 

-eJ(7-CP)- 1 /ef(7-CP)- 1 )^o- 


( 6 . 12 ) 


(6.13) 


(6.17) 
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With the help of (6.16) we immediately get formulas (6.13). Therefore, it remains 
to show that p k _ x , given by (6.17), is a polynomial of degree k — 1. Since the 
dimension of P is (fc + 1), p k _i behaves like for \(\ —> oo. Finally, the 
relation (6.15) implies that the Laurent series of p k _ 1 cannot contain negative 
powers. □ 


Putting (£/ — P)~ 1 l = u in (6.13) and applying Cramer’s rule to the linear 
system ((I — P)u = l we obtain from (6.13) the elegant expressions 


f?(C) = det 


cr(C) = det 


/C-l ^0 -1 

I 0 /, -2 

0 i 2 c-i 

V o i k o 

Po “I “I 
h C-l -2 
i 2 o c-l 


\L 0 


:0 


.. c-l ) 

: tO 


C-l/ 


(6.13a) 


(6.13b) 


We observe that g(() does not depend on l 0 . Further, = 1 is a simple root of 
p(C) if and only if l k ^ 0. We have 

g'(l)=a(l) = k\l k . (6.18) 


Condition (6.9) is equivalent to a k = 1 . 


Theorem 6.2. Assume that l k ^ 0. multistep method defined by (6.13) is of 
order at least k and its error constant (see (2.13)) is given by 


Here the components of 

b T =(B 0 ,B 1 , 
are the Bernoulli numbers. 


C = — 


b T l 

k\T k 


■i B k) - (l ,_ 2’ 6’°’ 


11 , 

30’ °’ 42’ ’ ’ ' 


Proof. By Theorem 2.4 we have order k iff 

q (0 - logC • <7(0 = c k+1 (C - I) k+1 + o((C- i) k+2 ). 

Since det ((I — P) = (C — l) /c+1 this is equivalent to 

el (Cl - P)- 1 ! - log C • el (Cl - P)-H = c k+1 + 0((C - 1)) 
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and, by (6.18), it suffices to show that 

(logC • ej - ef)((I - P)- 1 = b T + 0(( C - 1)). 
Denoting the left-hand side of (6.19) by b T (() we obtain 
(C/-P) T 6(C) = (logC-e 0 - ei ). 

The q th component ( q > 2) of this equation 

9 v 

3=0 

is equivalent to 

1 


c^(c)-E( J j 6 i(c)=o 


CMC) ^hjiO 


q'- f ^ 0 j'- ( Q-j ) ! 




which is seen to be a Cauchy product. Hence, formula (6.20) becomes 


^ES W- et ES 6 A)= lo gc -t 

q >0 q >0 


which yields 


E ?*.«>- 

q >0 


t - log C 
e*-C 


(6.19) 

( 6 . 20 ) 


If we set £ = 1 in this formula we obtain 



therefore b q (l) = B q , the gth Bernoulli number (see Abramowitz & Stegun, Chap¬ 
ter 23). □ 


We have thus shown that to each Nordsieck method (6.8) there corresponds a 
linear multistep method of order at least k . Our next aim is to establish a corre¬ 
spondence in the opposite direction. 

Theorem 6.3. Let (g, a) be the generating polynomials of a k-step method (6.12) 
of order at least k and assume a k = l. Then we have: 

a) There exists a unique vector l such that q and a are given by (6.13). 

b) If in addition, the multistep method is irreducible, then there exists a non¬ 
singular transformation T such that the solution of (6.8’) is related to that of 
(6.12) by 


( 6 . 21 ) 
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where the j th component of u n is given by 

u (n) = f Ei=o {a k - j+i y n+i - hf3 k _ j+i f n+i ) for 0 < j < k - 1 , 

J \ hf n for j = k. 

( 6 . 22 ) 

Proof a) For every k th order multistep method the polynomial g(() is uniquely 
determined by a(() (see Theorem 2.4). Expanding the determinant in (6.13b) with 
respect to the first column we see that 

a(C) = l 0 (c ~ l) k + h(C- l^MC) + • • • + l k r k (0, 

where r •(£) is a polynomial of degree j satisfying r •( 1) 7 ^ 0. Hence, l can be 
computed from cr(C) • 

b) Let y 0 ,..., y k _ 1 and / 0 ,..., f k _^ be given. Then the polynomial p k _ x (Q 
in (6.15) satisfies 

Pk~i(0= 4 0 ) +u[ 0) c+...+ 4-iC fc_1 - 

On the other hand, if the starting vector z 0 for the Nordsieck method defined by 
l of (a) is known, then p k _ 1 (() is given by (6.17). Equating both expressions we 
obtain 

k—l 

= («(C)< it-*(Od')(/-</’) (6.23) 

3=0 

We now denote by tj (j = 0,..., k — 1) the coefficients of the vector polynomial 

k—l 

- ?(OeD (/ - cp r 1 = E 4 ( 6 - 24 > 

j=o 

and set = ef. Then let T be the square matrix whose j th row is tj so that 
u 0 m Tz 0 is a consequence of (6.23) and hf n = hy ' n . The same argument applied 

Vn+k-l and fro • • • * /n+fc-l inStead ° f * * ■ > 2/jfe-l and /o> ■ * * > fk-1 

yields u n = Tz n for all n . 

To complete the proof it remains to verify the non-singularity of T . Let v = 
(v 0 , tq,..., v k ) T be a non-zero vector satisfying Tv = 0. By definition of we 
have tq = 0 and from (6.24) it follows (using the transformation (6.16)) that 

^(C)To(C) = cr(C)Ti(C), (6.25) 

where rfC) = det(£J — P)ef (Cl ~ P)~ 1 v are polynomials of degree at most k. 
Moreover, Cramer’s rule shows that the degree of r x (C) is at most k — l, since 
v 1 = 0. Hence from (6.25) at least one of the roots of g(Q must be a root of a ((). 
This is in contradiction with the assumption that the method is irreducible. □ 
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Table 6.1. Coefficients lj of the k -step implicit Adams methods 



lo 

h 

h 

h 

k 

is 

k 

k= 1 

1/2 

1 






k = 2 

5/12 

1 

1/2 





k = 3 

3/8 

1 

3/4 

1/6 




k = 4 

251/720 

1 

11/12 

1/3 

1/24 



k = 5 

95/288 

1 

25/24 

35/72 

5/48 

1/120 


k = 6 

19087/60480 

1 

137/120 

5/8 

17/96 

1/40 

1/720 


Table 6.2. Coefficients lj of the k -step BDF-methods 



^0 

Ii 

h 

h 

k 

k 


fc= 1 

i 

1 






k = 2 

2/3 

1 

1/3 





fe = 3 

6/11 

1 

6/11 

l/n 




/c = 4 

12/25 

1 

7/10 

1/5 

1/50 



k = 5 

60/137 

1 

225/214 

85/274 

15/274 

1/274 


k = 6 

20/49 

1 

58/63 

5/12 

25/252 

1/84 

1/1764 


The vectors l which correspond to the implicit Adams methods and to the 
BDF-methods are given in Tables 6.1 and 6.2. For these two classes of methods we 
shall investigate the equivalence in some more detail. 


Implicit Adams Methods 


The following results are due to Byrne & Hindmarsh (1975). Since their “efficient 
package” EPISODE and the successor VODE are based on the Nordsieck repre¬ 
sentation of variable step size methods, we extend our considerations to this case. 
The Adams methods define in a natural way a polynomial which approximates the 
unknown solution of (6.2). Namely, if y n and / n ,..., f n _ k+1 are given, then the 
k- step Adams method is equivalent to the construction of a polynomial p n+1 (x) 
of degree k + 1 which satisfies 

Pn+l i X n) Pul Pn+1 (*^n+l) 2/n+l’ 

/ (6.26) 
P n +i( x j) = fj for jp n - k + 1 ,..., n + 1 . 

Condition (6.26) defines y n+1 implicitly. We observe that the difference of two 
consecutive polynomials, p n+1 (x) — p n (x ), vanishes at x n and that its derivative 
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is zero at x n _ k+1 , ...,x n . Therefore, if we let e n+1 = y n+1 ~P n (x n+1 ), this 
difference can be written as 


Pu+ lO) -P„(x) = A ( J )' 


'n +1 


(6.27) 


where A is the unique polynomial of degree (k + 1) defined by 

A( 0 ) = 1 , A(— 1 ) = 0 


A ; 


/ 7 ^n+l \ 
\X„ , , —x^J 


= 0 for j = n — k + 1 ,.,., n. 


(6.28) 


The derivative of (6.27) taken at x = £ n+1 shows that with h n = x n+1 —x n , 

/I n / n+1 -VnK+i) = A '(°) e n+i- 

If we introduce the Nordsieck vector 

/ \ T 
Z n = (PnK). h nPn( X n), • • • » 7TT , M ^n 


(fc + l)! J 


and the coefficients L by 


/c+l 

a«) = E'/. 

J=0 


(6.29) 


then (6.27) becomes equivalent to 

^+1 = Pz n +Tl f 1 (h / n+1 - ef PJ n ) (6.30) 

with l = (/ 0 , Z x ,..., // e+1 ) T . This method is of the form ( 6 . 8 ’). However, it is of 
dimension k + 2 and not, as expected by Theorem 6.3, of dimension k + 1. The 
reason is the following: let £•(£) and be the generating polynomials of the 
multistep method which corresponds to (6.30). Then the conditions A(—1) = 0 
and A'(— 1 ) = 0 imply that <r(0) = g>(0) = 0 , so that this method is reducible. 
Nevertheless, method (6.30) is useful, since the last component of z n can be used 
for step size control. 

Remark. For k > 2 the coefficients l -, defined by (6.29), depend on the step size 
ratios h-/h-_ x for j = n — k + 2,..., n . They can be computed from the formula 

Am = Li nkLzMji 

where t j = ( x n _ j+1 - x n+1 )/(x n+1 -x n ) (see also Exercise 1). 


(6.31) 
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BDF-Methods 


One step of the k- step BDF method consists in constructing a polynomial q n+1 (x) 
of degree k which satisfies 

Qn+i( x j)=Vj for j =n-k+l,...,n+l 

’ t \ - t (6 ' 32) 

^n+lV^n+l) — Jn +1 

and in computing a value y n+1 which makes this possible. As for the Adams 
methods we have 

q n+1 (x) - q n (x) = ) • (y n+1 -q n (x n+1 )), (6.33) 

VX n+1 X n y 

where A (t) is the polynomial of degree k defined by 

A f ——- \ =0 for j = n — k + 1,..., n, 


A(0) = 1. 


With the vector 


= (g„0O, • • *» 


and the coefficients L given by 




equation (6.33) becomes 


Z n+1 =Pz n + ll 1 1 ( hf n+1 - e\Pz n ). 

The vector l = (Z 0 , Z 1? ..., / /c ) T can be computed from the formula 


A(i) - II( 1 + r) 

j =1 


where t • = {x n _- +1 — x n+ 1 )/{x n+1 —x n ). For constant step sizes formula (6.34) 

corresponds to that of Theorem 6.3 and the coefficients 1^ = coincide with 
those of Table 6.2. 
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Exercises 

1. Let if (j = 0,.... k) be the Nordsieck coefficients of the fc-step implicit 
Adams methods (defined by Theorem 6.3 and given in Table 6.1). Further, 
denote by 7^ (j = 0,..., k + 1) the coefficients given by (6.29) and (6.31) 
for the case of constant step sizes. Show that 

Tf_ _ f if for j = 0 

for j = l,...,k+l. 

Use these relations to verify Table 6.1. 

2. a) Calculate the matrix T of Theorem 6.3 for the 3-step implicit Adams 

method. 

Result. 

/I 0 0 3/8 \ 

T -1 = o 0 0 1 

0663/4' 

\0 4 12 1/6/ 

Show that the Nordsieck vector z n is given by 

z n=(y n > h fn > (3/t/ n -4/i/ n _ 1 +/t/ n _ 2 )/4, (hf n —2hf n _ 1 +hf n _ 2 )/6^J . 

b) The vector z n for the 2-step implicit Adams method (6.30) (constant step 
sizes) also satisfies 

z n=(yrv h fm ( 3h fn~ 4h fn-l+ h f n- 2 )/ 4 ’ ( h fn~ 2h fn-l+ h f n- 2 )/ 6 ) > 

but this time y n is a less accurate approximation to y(x n ). 




III.7 Implementation and Numerical Comparisons 


There is a great deal of freedom in the implementation of multistep methods (even 
if we restrict our considerations to the Adams methods). One can either directly 
use the variable step size methods of Section III.5 or one can take a fixed step size 
method and determine the necessary offgrid values, which are needed for a change 
of step size, by interpolation. Further, it is possible to choose between the divided 
difference formulation (5.7) and the Nordsieck representation (6.30). 

The historical approach was the use of formula (1.9) together with interpola¬ 
tion (J.C. Adams (1883): “We may, of course, change the value of uj (the step size) 
whenever the more or less rapid rate of diminution of the successive differences 
shews that it is expedient to increase or diminish the interval. It is only neces¬ 
sary, by selection from or interpolation between the values already calculated, to 
find the coordinates for a few values of p separated from each other by the newly 
chosen interval.”). It is theoretically more satisfactory and more elegant to work 
with the variable step size method (5.7). For both of these approaches the change 
of step size is rather expensive whereas the change of order is very simple — one 
just has to add a further term to the expansion (1.9). If the Nordsieck represen¬ 
tation (6.30) is implemented, the situation is the opposite. There, the change of 
order is not as direct as above, but the step size can be changed simply by multiply¬ 
ing the Nordsieck-vector (6.1) by the diagonal matrix with entries (1, u, uo 2 ,...) 
where uj = ^ new /^old * s ste P s ^ ze ra h°- Indeed, this was the main reason for 
introducing this representation. 


Step Size and Order Selection 


Much was made of the starting of multistep computations and the need for Runge- 
Kutta methods in the literature of the 60ies (see e.g., Ralston 1962). Nowadays, 
codes for multistep methods simply start with order one and very small step sizes 
and are therefore self-starting. The following step size and order selection is closely 
related to the description of Shampine & Gordon (1975). 

Suppose that the numerical integration has proceeded successfully until x n 
and that a further step with step size h n and order fc + 1 is taken, which yields the 
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approximation y n+1 to y(x n+1 ). To decide whether y n+1 will be accepted or not, 
we need an estimate of the local truncation error. Such an estimate is e.g. given by 

le k+1 (n+l)=y* n+1 -y n+1 

where y* +1 is the result of the (k + 2) nd order implicit Adams formula. Subtract¬ 
ing formula (5.7) from the same formula with k replaced by k + 1, we obtain 

le k+1 {n+l) = h n {g k+1 {n)-g k {n))$ k+1 {n+l). (7.1) 

Without changing the leading term in this expression we can replace the expression 

$ fe+ i(n+l) by 

k 

K+l( n + !) = IL X «+1 “ X n-i) Sk+1 f P l X n+l, x n, ■ • • , x n-kl C 7 - 2 ) 

i =0 

The superscript p of f indicates that / n+1 = f(x n+1 ,y n+1 ) is replaced by 
f( x n+i ’ Pn+i) w h en forming the divided differences. If the implicit equation (5.7) 
is solved iteratively with p n+1 as predictor, then < T^ +1 (n + 1) has to be calculated 
anyway. Therefore, the only cost for computing the estimate 

LE k+1 (n + 1 ) = h n ( g k+1 (n ) - g k (n))$ p k+1 (n + 1) (7.3) 

is the computation of g k+1 {n) . After the expression (7.3) has been calculated, we 
require (in the norm (4.11) of Section II.4) 

\\LE k+ i(n+l)\\<l (7.4) 

for the step to be successful. 

If the Nordsieck representation (6.30) is considered instead of (5.7), then the 
estimate of the local error is not as simple, since the /-vectors in (6.30) are totally 
different for different orders. For a possible error-estimate we refer to the article of 
Byrne & Hindmarsh (1975). 

Suppose now that y n+1 is accepted. We next have to choose a new step size 
and a new order. The idea of the step size selection is to find the largest h n+1 for 
which the predicted local error is acceptable, i.e., for which 

K+i ' \9k+i( n + 1 )~9k( n + 1 )\' ll$fe+i(« + 2 )ll — !• 

However, this procedure is of no practical use, since the expressions g- (n + 1) and 
i(n + 2) depend in a complicated manner on the unknown step size h n+1 . 
Also, the coefficients g k + 1 (n + 1) and g k (n-\- 1) are too expensive to calculate. 
To overcome this difficulty we assume the grid to be equidistant (this is a doubtful 
assumption, but leads to a simple formula for the new step size). In this case the lo¬ 
cal error (for the method of order k + 1) is of the form C(x n+2 )h kJr2 +0(/i /c + 3 ) 
with C depending smoothly on x . The local error at x n+2 can thus be approxi¬ 
mated by that at x n+1 and in the same way as for one-step methods (cf. Section II.4 
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formula (4.12)) we obtain 


/6 Opt ll n 


LE k+1 (n + l) 


\l/(fc+2) 

[j 


(7.5) 


as optimal step size. The local error LE k+1 (n +1) is given by (7.3) or, again under 
the assumption of an equidistant grid, by 


LE k+1 {n +1) = h n7 * k+1 $ p k+1 (n+ 1) (7.6) 

with 7 ^ +1 from Table 1.2 (see Exercise 1 of Section III.5 and Exercise 4 of Sec¬ 
tion III.l). 

We next describe how an optimal order can be determined. Since the number 
of necessary function evaluations is the same for all orders, there are essentially 
two strategies for selecting the new order. One can choose the order k + 1 either 
such that the local error estimate is minimal, or such that the new optimal step size 
is maximal. Because of the exponent l/(k + 2) in formula (7.5), the two strategies 
are not always equivalent. For more details see the description of the code DEABM 
below. It should be mentioned that each implementation of the Adams methods — 
and there are many — contains refinements of the above description and has in 
addition several ad-hoc devices. One of them is to keep the step size constant if 
^new/^old * s near t0 1 • In this way the computation of the coefficients (n) is 
simplified. 


Some Available Codes 

We have chosen the three codes DEABM, VODE and LSODE to illustrate the 
order- and step size strategies for multistep methods. 

DEABM is a modification of the code DE/STEP/INTRP described in the book 
of Shampine & Gordon (1975). It belongs to the package DEPAC, designed by 
Shampine & Watts (1979). Our numerical tests use the revised version from Febru¬ 
ary 1984. For European users it is available from the “Rechenzentrum der RWTH 
Aachen, Seffenter Weg 23, D-5100 Aachen, Germany”. 

This code implements the variable step size, divided difference representation 
(5.7) of the Adams formulas. In order to solve the nonlinear equation (5.7) for 
V n+ 1 the value p n+1 is taken as predictor (P) , then f% +1 = /( x n+1 , p n+1 ) is cal- 
culated (E) and one corrector iteration (C) is performed, to obtain y n+1 . Finally, 
in the case of a successful step, / n+1 = f(x n+1 , y n+1 ) is evaluated (E) for the 
next step. This PECE implementation needs two function evaluations for each suc¬ 
cessful step. Let us also outline the order strategy of this code: after performing a 
step with order k +1, one computes LE k _ 1 {n +1), LE k (n + 1) and EE k+1 (n + 1) 
using a slight modification of (7.6). Then the order is reduced by one, if 

max(||LP fc _ 1 (n+ 1)||, \\LE k (n I 1) ) < \\LE k+1 (n + 1)]|. (7.7) 
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An increase in the order is considered only if the step is successful, (7.7) is violated 
and a constant step size is used. In this case one computes the estimate 

LE k+2 {n + 1) = h n 7fc +2 $ fc+2 (n + 1) 

using the new value f n+1 = f(% n +i,y n +i) anc ^ increases the order by one if 
||L£ fe+2 (n + l)||<||L£ fe+1 (n+l)||. 

In Fig. 7.1 we demonstrate the variation of the step size and order on the example 
of Section II.4 (see Fig. 4.1 and also Fig. 9.5 of Section II.9). We plot the solution 
obtained with Rtol — Atol— 10 -3 , the step size and order for the tolerances 10 -3 
and 10 -8 . We observe that the step size — and not the order — drops signifi¬ 
cantly at passages where the solution varies more rapidly. Furthermore, constant 
step sizes are taken over long intervals, and the order is changed rather often (espe¬ 
cially for Tol = 10 -8 ). This is in agreement with the observation of Shampine & 
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Gordon (1975): . small reductions in the estimated error may cause the order 

to fluctuate, which in turn helps the code continue with constant step size.” 

VODE with parameter MF =10 is an implementation of the variable-coefficient 
Adams method in Nordsieck form (6.30). It is due to Brown, Byrne & Hindmarsh 
(1989) and supersedes the older code EPISODE of Byrne & Hindmarsh (1975). 
The authors recommend their code “for problems with widely different active time 
scales”. We used the version of August 31, 1992. It can be obtained by sending an 
electronic mail to “netlib @ research.att.com” with the message 

send vode.f from ode to obtain double precision VODE, 

send svode.f from ode to obtain single precision VODE. 

The code VODE differs in several respects from DEABM. The nonlinear equa¬ 
tion (first component of (6.30)) is solved by fixed-point iteration until convergence. 
No final /-evaluation is performed. This method can thus be interpreted as a 
P(EC) M -method, where M , the number of iterations, may be different from step 
to step. E.g., in the example of Fig. 7.2 (Tol = 10 -8 ) only 930 function evaluations 
are needed for 535 steps (519 accepted and 16 rejected). This shows that for many 
steps one iteration is sufficient. The order selection in VODE is based on maximiz¬ 
ing the step size among h ^ t , * ^opt^ • Fig- 7.2 presents the step size and 

order variation for VODE for the same example as above: compared to DEABM 
we observe that much lower orders are taken. Further, the order is constant over 
long intervals. This is reasonable, since a change in the order is not natural for the 
Nordsieck representation. 

LSODE (with parameter MF = 10) is another implementation of the Adams meth¬ 
ods. This is a successor of the code GEAR (Hindmarsh 1972), which is itself a 
revised and improved code based on DIFSUB of Gear (1971). We used the version 
of March 30, 1987. LSODE is based on the Nordsieck representation of the fixed 
step size Adams formulas. It has the same interface as VODE and can be obtained 
by sending an electronic mail to “netlib @ research.att.com” with the message 
send lsode.f from odepack 

to obtain the double precision version. Fig. 7.3 shows the step sizes and orders 
chosen by this code. It behaves similarly to VODE. 
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Numerical Comparisons 

Of the three families of methods, the fixed order Runge-Kutta is 
the simplest, in several respects the best understood, and the least 
efficient. (Shampine & Gordon 1975) 

It is, of course, interesting to study the numerical performance of the above imple¬ 
mentations of the Adams methods: 

DEABM — symbol K 
VODE — symbol O 
LSODE — symbol A 

In order to compare the results with those of a typical one-step Runge-Kutta method 
we include the results of the code 

DOP853 — symbol 0 
described in Section II.5. 

With all these methods we have computed the numerical solution for the 
six problems EULR, AREN, LRNZ, PLEI, ROPE, BRUS of Section II. 10 us¬ 
ing many different tolerances between 10 -3 and 10 -14 (the “integer” tolerances 
10 -3 , 10 -4 ,... are distinguished by enlarged symbols). Fig. 7.4 gives the number 
of function evaluations plotted against the achieved accuracy in double logarith¬ 
mic scale. Some general tendencies can be distinguished in the crowds of numer¬ 
ical results. LSODE and DEABM require, for equal obtained accuracy, usually 
less function evaluations, with DEABM becoming champion for higher precision 
(7h/<10- 6 ). 

The situation changes dramatically in favour of the Runge-Kutta code DOP853 
if computing time is measured instead of function evaluations (see Fig. 7.5; the CPU 
time is that of a Sun Workstation, SunBlade 100). We observe that for problems 
with cheap function evaluations (EULR, AREN, LRNZ) the Runge-Kutta code 
needs much less CPU time than the multistep codes, although more function evalu¬ 
ations are necessary in general. For the problems PLEI and ROPE, where the right 
hand side is rather expensive to evaluate, the discrepancy is not as large. For the 
last problem (BRUS) the dimension is very high, but the individual components are 
not too complicated. In this situation, the CPU time of DOP853 is also significantly 
less than for the multistep codes; this indicates that their overhead also increases 
with the dimension of the problem. 
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Fig. 7.4. Precision versus function calls for the problems of Section 11.10 
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Fig. 7.5. Precision versus computing time for the problems of Section II. 10 
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... methods sufficiently general as to include linear multistep and 
Runge-Kutta methods as special cases ... 

(K. Burrage & J.C. Butcher 1980) 


In a remarkably short period (1964-1966) many independent papers appeared 
which tried to generalize either Runge-Kutta methods in the direction of multi- 
step or multistep methods in the direction of Runge-Kutta. The motivation was 
either to make the advantages of multistep accessible to Runge-Kutta methods or 
to “break the Dahlquist barrier” by modifying the multistep formulas. “General¬ 
ized multistep methods” were introduced by Gragg and Stetter in (1964), “modified 
multistep methods” by Butcher (1965a), and in the same year there appeared the 
work of Gear (1965) on “hybrid methods”. A year later Byrne and Lambert (1966) 
published their work on “pseudo Runge-Kutta methods”. All these methods fall 
into the class of “general linear methods” to be discussed in this section. 

An example of such a method is the following (Butcher (1965a), order 5) 

Vn+ 1/2 = Vn-1 + g ( 9 /n + 3 /n-l) 

Vn+i = ^(28y n -23y n _ 1 ) + ^(32/ n+1/2 -60/ n -26/ n _ 1 ) (8.1) 

Vn+l = ^( 32 2/n-yn-l) + ^( 64 /n+l/2 + 15 /n+l+ 12 /n-/n-l)- 


We now have the choice of developing a theory of “generalized” multistep meth¬ 
ods or of developing a theory of “generalized” Runge-Kutta methods. After having 
seen in Section III.4 that the convergence theory becomes much nicer when multi- 
step methods are interpreted as one-step methods in higher dimension, we choose 
the second possibility: since formula (8.1) uses y n and y n _ x as previous infor¬ 
mation, we introduce the vector u n = (y n ,y n _ 1 ) T so that the last line of (8.1) 
becomes 


( 


^n+l 


32 

31 

1 



12 

93 


0 


/ hf(y n+ 1 /2 ) \ 
“ 93 ) hf(y n+ 1 ) 

0 ) hf{y n ) 

\ hf(y n _ 1 ) / 


which, together with lines 1 and 2 of (8.1), is of the form 


u n+1 = Su n + h<f>(x n ,u n ,h). (8.2) 

Properties of such general methods have been investigated by Butcher (1966), 
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Hairer & Wanner (1973), Skeel (1976), Cooper (1978), Albrecht (1978, 1985) and 
others. Clearly, nothing prevents us from letting S and <f> be arbitrary, or from 
allowing also other interpretations of u n . 


A General Integration Procedure 


We consider the system 

y'=f(x,y), y(x 0 )=y 0 (8.3) 

where / satisfies the regularity condition (4.2). Let m be the dimension of the 
differential equation (8.3), q > m be the dimension of the difference equation (8.2) 
and x n = x 0 + nh be the subdivision points of an equidistant grid. The methods 
under consideration consist of three parts: 

i) a forward step procedure, i.e., a formula (8.2), where the square matrix S is 
independent of (8.3). 

ii) a correct value function z(x,h ), which gives an interpretation of the values 
u n \ z n = z(x n , h) is to be approximated by u n , so that the global error is 
given by u n — z n . It is assumed that the exact solution y{x) of (8.3) can be 
recovered from z(x,h ). 

iii) a starting procedure p(h) , which specifies the starting value u Q = ip(h). 
<p{h) approximates z 0 = z(x 0 , h ). 

The discrete problem corresponding to (8.3) is thus given by 

u 0 = ip(h), (8.4a) 

u n+1 = Su n + h$(x n ,u n ,h), n = 0,1,2 ,..., (8.4b) 

which yields the numerical solution u 0 , u x , u 2 , .... We remark that the increment 
function <f>(x, u, h) , the starting procedure cp(h) and the correct value function 
z(x, h ) depend on the differential equation (8.3), although this is not stated explic¬ 
itly. 

Example 8.1. The most simple cases are one-step methods. A characteristic fea¬ 
ture of these is that the dimensions of the differential and difference equation are 
equal (i.e., m = q) and that S is the identity matrix. Furthermore, p(h) = y 0 and 
z(x,h) =y{x). They have been investigated in Chapter II. 

Example 8.2. We have seen in Section III.4 that linear multistep methods also fall 
into the class (8.4). For A;-step methods the dimension of the difference equation 
is q = km and the forward step procedure is given by formula (4.8). A starting 
procedure yields the vector ip(h) = (y fc _ 1 ,..., y Xl yf) T and, finally, the correct 
value function is given by 

z(x, h) = (y(x + (k- 1 )h), ...,y(x + h), y{x)) T . 
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The most common way of implementing an implicit multistep method is a 
predictor-corrector process (compare (1.11) and Section III.7): an approximation 
Vn+k t0 Vn+k 1S “predicted” by an explicit multistep method, say 

a lynlk + Uk-lVn+k-l + • • - + «0 Vn = K^k-lfn+k-l + ■ ■ • + 00 fn) ( 8 - 5 ; P ) 

and is then “corrected” (usually once or twice) by 

f^^fixn+kivtk) ( 8 - 5 ;E) 

a kVn+ k + a fc-l2/n+fc-l +' ' - + a 0y n = HPkfn+k' > + Pk-lfn+k-1 + • • • +A)/ra)’ 

(8.5;C) 

If the iteration (8.5) is carried out until convergence, the process is identical to that 
of Example 8.2. In practice, however, only a fixed number, say M, of iterations 
are carried out and the method is theoretically no longer a “pure” multistep method. 
We distinguish two predictor-corrector (PC) methods, depending on whether it ends 
with a correction (8.5;C) or not. The first algorithm is symbolized as P(EC) M and 
the second possibility, where f n+k is once more updated by (8.5;E) for further use 
in the subsequent steps, as P(EC) M E. We shall now see how these two procedures 
can be interpreted as methods of type (8.4). 

Example 8.2a. P(EC) M E-methods. The starting procedure and the correct value 
function are the same as for multistep methods and also q M km . Furthermore we 
have S = A® I, where A is given by (4.7) and I is the m -dimensional identity 
matrix. Observe that S depends only on the corrector-formula and not on the 
predictor-formula. Here, the increment function is given by 

<f>(x, u , h) = (e 1 <S> 7) , 0(x, it, h) 

with e 1 = (1, 0,..., 0) T . For u = (it 1 ,..., u k ) T with vj E M m the function 
^(x,u,h) is defined by 

U, h) = ap (p k f(x + kh, y {M) ) 

+ /? fc -i/( x + (k-l)h, u 1 ) + ... + P 0 f(x, u k )^ 

where the value y {AI) is calculated from 

a fe^ 0) + a k-1^ 1 + • • • + a o uk 

= h{0l_ 1 f(x +(k- 1 )h, u 1 ) + ... + Pof(x, u k )) 

a k y (l) + a k-i ul + • • • + a o uk 

= h(^/3 k f(x+kh,y (l ~ 1) )+/3 k _ 1 f(x+(k-l)h,u 1 ) +.. .+P 0 f(x,u k )j 
(for / = !,..., M). 




III. 8 General Linear Methods 433 


Example 8.2b. For P(EC) M -methods, the formulation as a method of type (8.4) 
becomes more complicated, since the information to be carried over to the next 
step is determined not only by y n+k _ 1: ..., y n , but also depends on the values 

hf n+k _ i, ...,hf n , where hf n+j = hf(x n+j , y i n +~ 1) ) • Therefore the dimension 
of the difference equation becomes q = 2 km. A usual starting procedure (as for 
multistep methods) yields 

<p( h ) = (vk- n •••>%’ h f( x k- 1> Vk- 1)> • • • > h f{x o, t/ 0 )) . 

If we define the correct value function by 

z(x, h)'j*F (y (x + (k - l)h),..., y(x), hy’(x + (k - l)h),..., hy'(x)\ , 


the forward step procedure is given by 

S=(o 

Here A is the matrix given by (4.7), f3 \'• = flj/a k and 



( 0 

0 . 

. 0 

°\ 


(fik -1 • 

•• fi'o \ 


/ 1 \ 

N = 

i 

0 . 

. 0 

0 

, B = 

0 

. 0 

> e l = 

0 


Vo 

0 ., 

.. 1 

0 j 


V 0 . 

.. 0 / 


VO/ 


For u = (it 1 ,..., u k , hv 1 ,..., hv k ) the function ip(x, u, h ) G M. q is defined by 
ip(x, u, h) = /(X + kh, y (M_1) ) 

where is given by 

oP k y^ + +...+ +...+ f%v k ) 

oi k y® +a k _ 1 u 1 +.. .+ a Q u k m h(/3 k f(x+kh, y^ l ~ 1>} ) + /4-i' yl +...+ /3 0 v k ). 

Again we observe that S depends only on the corrector-formula. 

Example 8.3. Nordsieck methods are also of the form (8.4). This follows immedi¬ 
ately from the representation (6.8). In this case the correct value function 

z(x, h) = (y(x), hy'(x), ^y"{x), ^-y (fe) (a:)) 

is composed not only of values of the exact solution, but also contains their deriva¬ 
tives. 

Example 8.4. Cyclic multistep methods. Donelson & Hansen (1971) have investi¬ 
gated the possibility of basing a discretization scheme on several different k -step 
methods which are used cyclically. Let Sj and represent the forward step 
procedure of the jth multistep method; then the numerical solution u 0 , it 1 ,... is 
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defined by 

u 0 = tp(h) 

u n+i = s j u n + h ®j( x rv u m h ) if n = (j - 1) mod to. 

In order to get a method (8.4) with S independent of the step number, we consider 
one cycle of the method as one step of a new method 


< +1 =Su* n + h*<f>(x* n ,u* n ,h*) 

with step size h* = mh. Here x* n = x 0 + nh*, S = S m ... S 2 S 1 and <f> has to be 
chosen suitably. E.g., in the case m — 2 we have 

$(**,«*,/»*)= 

U / , k* , h* . „ „ At*. /l*\ 

+ 2 $2 V +Y’^ + Y $i(x, W ,y), y). 

It is interesting to note that cyclically used A;-step methods can lead to convergent 
methods of order 2 k — 1 (or even 2k). The “first Dahlquist barrier” (Theorem 3.5) 
can be broken in this way. For more details see Stetter (1973), Albrecht (1979) and 
Exercise 2. 


Example 8.5. General linear methods. 

Following the advice of Aristotle ... (the original Greek can be 
found in Butcher’s paper) ... we look for the greatest good as a 
mean between extremes. (J.C. Butcher 1985a) 


Introduced by Burrage & Butcher (1980), these methods are general enough to 
include all previous examples as special cases, but at the same time the increment 
function is given explicitly in terms of the differential equation and several free 
parameters. They are defined by 

v< i n) = J2% u j n) + h J2h j f( x n + c j h , vj n) ) 

3= 1 3 =1 

U (n+ 1 ) = Y j a ij uf ) + hJ2b ij f(x n +c j h, vf ] ) 

3 = 1 3 = 1 

The stages v\ ’ (i = 1,..., s) are the internal stages and do not leave the “black 
box” of the current step. The stages u[ n ^ (i = 1,..., k) are called the external 
stages since they contain all the necessary information from the previous step used 
in carrying out the current step. The coefficients a- in (8.7b) form the matrix S 
of (8.4b). Very often, some internal stages are identical to external ones, as for 


i = 1,..., 5, (8.7a) 

i = (8.7b) 
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example in method (8.1), where 

^n (^n+1/2’ ^n+1’ Vn— l) * 

One-step Runge-Kutta methods are characterized by k = 1. At the end of this 
section we shall discuss the algebraic conditions for general linear methods to be 
of order p . 


Example 8.6. In order to illustrate the fact that the analysis of this section is not 
only applicable to numerical methods that discretize first order differential equa¬ 
tions, we consider the second order initial value problem 

y"=g(x,y), y(x 0 )=y 0 , y'(x 0 )=y' 0 ( 8 . 8 ) 

Replacing y"(x ) by a central difference yields 

Vn +1 - 2 Vn + Vn -1 = h 2 g{x n , y n ), 

and with the additional variables 


hy’ n = y n+1 - y n 


this method can be written as 


f ^n+l 

\y'n +1 


(^ ^ ^ f yn ^ h f yn 

V° 1 J \y'n) \9(x n+1 ,y n + hy' n ) 


It now has the form of a method (8.4) with the correct value function z(x,h) = 
(y(x), (y(x + h) — y(x))/h) T . Here y{x) denotes the exact solution of (8.8). 

Clearly, all Nystrom methods (Section 11.14) fit into this framework, as do mul¬ 
tistep methods for second order differential equations. They will be investigated in 
more detail in Section III. 10. 


Example 8.7. Multi-step multi-stage multi-derivative methods seem to be the most 
general class of explicitly given linear methods and generalize the methods of Sec¬ 
tion 11.13. In the notation of that section, we can write 


J i n) = 53 a ij U j n) + 53 S b ij )L>r y( X n + C i h > V T’) * = 1, • • • , S, 

3 = 1 

k q 


r\ 

r= 1 j= 1 


,(™+l) __ 


= 53 a iA H) + 53 53 b ij )c>r y( x n + c j h i v jA * = • • • > k - 


i=i 


r! 

r =i j =i 


Such methods have been studied in Hairer & Wanner (1973). 



436 III. Multistep Methods and General Linear Methods 


Stability and Order 


The following study of stability, order and convergence follows mainly the lines 
of Skeel (1976). Stability of a numerical scheme just requires that for h —> 0 the 
numerical solution remain bounded. This motivates the following definition. 

Definition 8.8. Method (8.4) is called stable if \\S n \\ is uniformly bounded for all 
n > 0. 


The local error of method (8.4) is defined in exactly the same way as for one- 
step methods (Section II.3) and multistep methods (Section III.2). 


Definition 8.9. Let z(x,h) be the correct value function for the method (8.4) and 
let z n = z(x n , h ). The local error is then given by (see Fig. 8.1) 


d o = z o~ f( h ) 

d n+1 = z n+1 - Sz n - h<f>(x n , z n , h), n = 0,1,... 


(8.9) 


local error 



The definition of order is not as straightforward. The requirement that the local 
error be 0{h p+1 ) (cf. one-step and multistep methods) will turn out to be suffi¬ 
cient but in general not necessary for convergence of order p . For an appropriate 
definition we need the spectral decomposition of the matrix S . 

First observe that, whenever the local error (8.9) tends to zero for h —> 0 (nh = 
x — x 0 fixed), we get 

0 = z(x, 0) — Sz(x, 0), (8.10) 

so that 1 is an eigenvalue of S and z(x, 0) a corresponding eigenvector. Fur¬ 
thermore, by stability, no eigenvalue of S can lie outside the unit disc and the 
eigenvalues of modulus one can not give rise to Jordan chains. Denoting the eigen¬ 
values of modulus one by Ci( = 1)? ( 2 ? • • • > Cz»the Jordan canonical form of S (see 
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(1.12.14)) is therefore the block diagonal matrix 

i \ /c 2 


Ci 


S = T diag - 




C2 


Ci , 


If we decompose this matrix into the terms which correspond to the single eigen¬ 
values we obtain 


S — E + ( 2 E 2 + ... + C \E l + E 


( 8 . 11 ) 


where 

E = T diag (J, 0,0,...) T -1 , (8.12) 

E 2 = T diag (o, /, 0,...) T _1 ,..., E t = T diag {o,..., 0, 1, o} T~\ 

E = T diag (o, 0,0,..., j) T -1 . 

We are now prepared to give 

Definition 8.10. The method (8.4) is of order p (consistent of order p), if for all 
problems (8.3) with p times continuously differentiable /, the local error satisfies 

d 0 = Q(h p ) 

0 (8.13) 

E(d 0 + d x + ... + d n ) + d n+1 = 0(h p ) for 0 < nh < Const. 


Remark. This property is called quasi-consistency of order p by Skeel (1976). 

If the right-hand side of the differential equation (8.3) is p-times continuously 
differentiable then, in general, ip(h), $>(x,u, h ) and z(x, h ) are also smooth, so 
that the local error (8.9) can be expanded into a Taylor series in h: 

=7o+7i^ + - • •Ey p _ 1 h p 1 + 0(h p ) (814) 

d n +i =5 0 (x n ) + 5 1 (x n )h + .. . + 5 p (x n )h p + 0(h p+1 ). 

The function Sj(x) is then (p — j + 1) -times continuously differentiable. The fol¬ 
lowing lemma gives a more practical characterization of the order of the methods 
(8.4). 

Lemma 8.11. Assume that the local error of method (8.4) satisfies (8.14) with 
continuous 8-(x). The method is then of order p, if and only if 

d n = 0(h p ) for 0 < nh < Const, and E5 p (x) = 0. (8.15) 

Proof. The condition (8.15) is equivalent to 

d n = 0(h p ), Ed n+1 = 0(h p+1 ) for 0 < nh < Const , (8.16) 
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which is clearly sufficient for order p. We now show that (8.15) is also necessary. 
Since E 2 = E (see (8.12)) order p implies 

d n = 0(h p ), E(d 1 + ... + d n ) = (D(h p ) for 0 <nh< Const. (8.17) 

This is best seen by multiplying (8.13) by E . Consider now pairs (n, h) such 
that nh = x — x 0 for some fixed x. We insert (8.14) (observe that d n = 0(h p )) 
into E(d 1 + ... + d n ) and approximate the resulting sum by the corresponding 
Riemann integral 

n pX 

E(d 1 + ... + d n ) = h p E^2S p (x j _ 1 ) + 0{h p ) = h p ~ 1 E / S p (s)ds + 0(h p ). 
j=i Jx ° 

It follows from (8.17) that E f x ° S p (s ) ds = 0 and by differentiation that E5 p (x ) = 

o. Xo □ 


Convergence 


In addition to the numerical solution given by (8.4) we consider a perturbed numer¬ 
ical solution (u n ) defined by 


u 0 =ip(h)+r 0 

u n+1 = Su n + h<f>( x n,u n ,h)+r n+1 , n = 0,l,...,N-l 


(8.18) 


for some perturbation R = (r 0 , r u ..., r N ) . For example, the exact solution z n = 
z(x n , h ) can be interpreted as a perturbed solution, where the perturbation is just 
the local error. The following lemma gives the best possible qualitative bound on 
the difference u n — u n in terms of the perturbation R . We have to assume that 
the increment function $(x,u,h) satisfies a Lipschitz condition with respect to u 
(on a compact neighbourhood of the solution). This is the case for all reasonable 
methods. 


Lemma 8.12. Let the method (8.4) be stable and assume the sequences (u n ) and 
(u n ) be given by (8.4) and (8.18), respectively. Then there exist positive constants 
c and C such that for any perturbation R and for hN < Const 

cll-Rlls < max \\u n du n \\ < C , ||fi’|| s 

0<n<AT 

with 

ll#lls= max 

0<n<N 


7 = 0 


Remark. ||i?|| 5 is a norm on R( iY + 1 )^. Its positivity is seen as follows: if ||i2||^ = 0 
then for n = 0,1, 2,... one obtains r 0 = 0, r x = 0, ... recursively. 
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Proof. 

have 


Set A u n = u n -u n and A$ n = $(® n , u n , h) - u n , h ). Then we 

Au n+1 =SAu n + hA<S> n + r n+1 . (8.19) 


By assumption there exists a constant L such that ||A<L n || < L\\Au n \\. Solving 
the difference equation (8.19) gives A u 0 = r 0 and 

n n+1 

A u n+1 = S n ~ i hA^ j + 5 n+1_J > J - (8-20) 

J=0 j=0 

By stability there exists a constant .S such that 

\\S n \\L<B for all n > 0. (8.21) 

Thus (8.20) becomes 

n 

||Au n+1 || < hB IIAujU + ||i?||sf. 

3=0 

By induction on n it follows that 

||AuJ| < (l + hB) n \\R\\ s <eMConst-B).\\R\\ s , 

which proves the second inequality in the lemma. From (8.20) and (8.21) 

n 

II J2 sn ~ jr j II < (! + nhB ) ll A «nll> 

j=o 

and we thus obtain for Nh < 


(1 + Const • .S) • max 

0 <n<N 


□ 


Remark. Two-sided error bounds, such as in Lemma 8.12, were first studied, in the 
case of multi-step methods, by Spijker (1971). This theory has become prominent 
through the treatment of Stetter (1973, pp. 81-84). Extensions to general linear 
methods are due to Skeel (1976) and Albrecht (1978). 

Using the lemma above we can prove 

Theorem 8.13. Consider a stable method (8.4) and assume that the local error sat¬ 
isfies (8.14) with 5 (x) continuously differentiable. The method is then convergent 
of order p, i.e., the global error u n — z n satisfies 

u n — z n = 0(h p ) for 0 <nh< Const, 
if and only if it is consistent of order p. 
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Proof. The identity 

n+1 n 

E(d 0 • • d n ) I d n+1 = ^ S n+1 ~id j - (S - E) 53 5" 'd,, 

j=0 i=0 

which is a consequence of ES = F? (see (8.11) and (8.12)), implies that for n < 
N-l and D = (d 0 ,...,d N ), 

II E(d 0 + ... + d n ) + d n+1 1 < (1 + ||S - E\\) • ||D|| S . (8.22) 

The lower bound of Lemma 8.12, with r n and u n replaced by d n and z n respec¬ 
tively, yields the “only if” part of the theorem. 

For the “if” part we use the upper bound of Lemma 8.12. We have to show that 
consistency of order p implies 

n 

max \\y^ S n ~ j dA\=0(h p ). (8.23) 

- j =0 

By (8.11) and (8.12) we have 

S n ~ j =E + ~ j E 2 + ... + C i~ i E l + E n ~ j . 

This identity together with Lemma 8.11 implies 

n n 

53 sr-tdj= h p e 2 53 c +■■■ 

3=0 j =1 

+ hPE, 53 + E ^ n ~ id i + °( hP )- 

3 = 1 3=0 


The last term in this expression is 0(h v ) since in a suitable norm ||E|| < 1 and 
therefore 


3=0 


n 


<53 wEr-^dj ii < 

3=0 


1 




For the rest we use partial summation (Abel 1826) 

n 1 _ pn n 1 _ pn—j / \ 

E = yzy • s ( x o )+E - ^-i)J = 

3 =1 ^ J=1 ^ 

whenever \(\ = 1, £ ^ 1 and 5 is of bounded variation. □ 
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Order Conditions for General Linear Methods 

For the construction of a pth order general linear method (8.7) the conditions (8.15) 
are still not very practical. One would like to have instead algebraic conditions in 
the free parameters, as is the case for Runge-Kutta methods. We shall demonstrate 
how this can be achieved using the theory of B-series of Section 11.12 (see also 
Burrage & Moss 1980). In order to avoid tensor products we assume in what fol¬ 
lows that the differential equation under consideration is a scalar one. All results, 
however, are also valid for systems. We further assume the differential equation to 
be autonomous, so that the theory of Section 11.12 is directly applicable. This will 
be justified in Remark 8.17 below. 

Suppose now that the components of the correct value function z(x,h) = 
(z^x, h ),..., z k (x, h)) T possess an expansion as a B-series 

z i (x,h) = B(z i ,y(x)) 

so that with z (t) = (z 1 (f),..., z /c (f)) T , 

z[x, h ) = z (fb)y(x) + hz(r)f(y(x)) + .... (8.24) 

Before deriving the order conditions we observe that (8.7a) makes sense only if 
v j U ^ y( x n ) f° r ^ ► 0- Otherwise /( v need not be defined. Since is 
an approximation of Zj(x n , h), this leads to the condition J2^ij z j(^) = 1* This 
together with (8.10) are the so-called preconsistency conditions: 

Az(0) = z(0), Az(0) = 1. (8.25) 

A and A are the matrices with entries a- and a -, respectively, and 11 is the 
column vector (1,..., 1) T . Recall that the local error (8.9) for the general linear 
method (8.7) is given by 

k s 

d i n+1) = z i( x n + h,h)-J2 a ij z A x n » h ) - J2 bijhf(Vj) (8.26a) 

3 = 1 3 = 1 

where 

k s 

V i ='52 d ij Z j( X n’ h ) + ^ZKj h f( V j)- ( 8 - 26b ) 

3 = 1 3 =1 

For the derivation of the order conditions we write v i and d- n+1 ^ as B-series 

v i = By^viXn)), 4 n+1) =B(d i ,y(x n )). 

By the composition theorem for B-series and by formula (12.10) of Section 11.12 
we have 

z i( x n + h - h ) = B { z i,y( x n + h )) = B { z i,B(p,y(x n ))) =B(pz i ,y(x n )). 
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Inserting all these series into (8.26) and comparing the coefficients we arrive at 

k s 

d i(t) = (pz i)(t) - a ij Z j W “ E bijVjit) 

k j=1 s j=1 (8 - 2 

v i(t) = ^2a i jZ j (t) + ^2b ij v' j (t). 


An application of Lemma 8.11 now yields 

Theorem 8.14. Let d(t) = (d 1 (t),..., d k (t)) T with d -(f) he given by (8.27). The 
general linear method (8.7) is of order p, iff 

d(t)=0 for teT, g(t)<p- 1, 

(o.Zo) 

Ed(t) = 0 for t G T, g(t) = p, 

where the matrix E is defined in (8.12). □ 


Corollary 8.15. Sufficient conditions for the general linear method to be of order 
p are 

d (t) = 0 for t GT, g(t) < p. (8.29) 

□ 

Remark 8.16. The expression (pz•)(£) in (8.27) can be computed using formula 
(12.8) of Section 11.12. Since p(t) = 1 for all trees t, we have 

(p^)( t )=E( e( f)Jjj E z >( s iW)- ( 8 - 30 > 

3=0 V J J K) all labellings 

This rather complicated formula simplifies considerably if we assume that the co¬ 
efficients z fit) of the correct value function depend only on the order of t , i.e., 
that 

z fit) = z fiu) whenever g(t) = g(u) . (8.31) 

In this case formula (8.30) becomes 

( p z i)(f) = E ( e ^) z E)- ( 8 - 32 ) 

3=0 ^ J ' 

Here r-i represents any tree of order j , e.g., 

r j = [r,.. . ,r], r x =r, r° = 0. (8.33) 

j-i 
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Usually the components of z(x, h ) are composed of 

y(x), y(x + jh), hy'(x), h 2 y"(x),..., 
in which case assumption (8.31) is satisfied. 

Remark 8.17. Non-autonomous systems. For the differential equation x' = 1, 
formula (8.7a) becomes 

v n = Au n + hB II. 

Assuming that x' = 1 is integrated exactly, i.e., u n = z($)x n + hz(r) we obtain 
v n = x n t + he , where c = (c 1? ..., c s ) T is given by 

c = Az(r) + Be. (8.34) 

This definition of the c i implies that the numerical results for y' = f(x,y) and for 
the augmented autonomous differential equation are the same and the above results 
are also valid in the general case. 

Table 8.1 presents the order conditions up to order 3 in addition to the precon¬ 
sistency conditions (8.25). We assume that (8.31) is satisfied and that c is given by 
(8.34). Furthermore, cP denotes the vector (cj, ..., ci) T . 


Table 8.1. Order conditions for general linear methods 


t 

Q(t) 

order condition 

T 

1 

Az(r) + Bt = z(r) + z(Q) 

T 2 

2 

Az(t 2 ) + 2 Be = z(r 2 ) + 2 z{r) + z(fb) 

T 3 

3 

Az(r 3 ) + 3 Be 2 = z(r 3 ) + 3z(r 2 ) + 3 z(r) + z(0) 

[T 2 ] 

3 

Az(r 3 ) + 3Bv(r 2 ) = z(r 3 ) + 3 z{r 2 ) + 3 z(r) + z(0) 
with v(r 2 ) = Az(t 2 ) + 2Bc 


Construction of General Linear Methods 

Let us demonstrate on an example how low order methods can be constructed: we 
set k = s = 2 and fix the correct value function as 

z(x,h) = (y(x), y(x-h)) T . 

This choice satisfies (8.24) and (8.31) with 
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Since the second component of z(x-\-h,h) is equal to the first component of 
z(x,h), it is natural to look for methods with 



We further impose 



so that the resulting method is explicit. 

The preconsistency condition (8.25), formula (8.34) and the order conditions 


of Table 8.1 yield the following equations to be solved: 

— 1 (8.35a) 

a n + a 12 = 1 ? a 21 + a 22 = 1 (8.35b) 

C 1 = — a 12> C 2 = ^21 — a 22 (8.35c) 

—^12 T &]_]_ + ^12 = 1 (8.35d) 

a i2 + 2 (^11 C 1 + ^12 C 2) = 1 (8.35e) 

~ a i2 + ^(^n c i + ^12^2) = 1 (8.35f) 

—a 12 + ^(^ll a 12 + ^12(^22 + 2 ^21 C l)) = 1* (8.35g) 


These are 9 equations in 11 unknowns. Letting c x and c 2 be free parameters, we 
obtain the solution in the following way: compute a 12 , 5 n and b 12 from the linear 
system (8.35d,e,f), then a 12 ,a 22 and b 21 from (8.35c,g) and finally a 11 ,a 11 and 
a 21 from (8.35a,b). A particular solution for c 1 — 1/2, c 2 = — 2/5 is: 

,/16/11 —5/11 \ td _ f 104/99 —50/99 \ 

A ~ l 1 o y ’ l o 0 J ’ 

(8.36) 

2 -( 3/2 - 1/2 \ 0 0 \ 

\3/2 —1/2^ ’ ^-9/10 0 )' 

This method, which represents a stable explicit 2-step, 2-stage method of order 3, 
is due to Butcher (1984). 

The construction of higher order methods soon becomes very complicated, and 
the use of simplifying assumptions will be very helpful: 


Theorem 8.18 (Burrage & Moss 1980). Assume that the correct value function 
satisfies (8.31). The simplifying assumptions 

Az(t j ) + jBc^- 1 — (9 j = 1,... — 1 (8.37) 

together with the preconsistency relations (8.25) and the order conditions for the 
“bushy trees ” 

d(V) = 0 j = l,...,p 

imply that the method (8.7) is of order p. 
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Proof. An induction argument based on (8.27) implies that 

v (f) = v(t j ) for g(t)=j, j = l,...,p-l 
and consequently also that 

d(f) = d(T- 7 ) for g(t)=j, j = l,...,p. 


□ 


The simplifying assumptions (8.37) allow an interesting interpretation: they are 

(n) 

equivalent to the fact that the internal stages ^ ; approximate the exact solution at 
x n + c i h up to order p— 1, i.e., that 

vl n) -y(x n +c l h) = 0(h p ). 

In the case of Runge-Kutta methods (8.37) reduces to the conditions C(p — 1) of 
Section II.7. 

For further examples of general linear methods satisfying (8.37) we refer to 
Burrage & Moss (1980) and Butcher (1981). See also Burrage (1985) and Butcher 
(1985a). 

Exercises 

1. Consider the composition of (cf. Example 8.5) 

a) explicit and implicit Euler method; 

b) implicit and explicit Euler method. 

To which methods are they equivalent? What is the order of the composite 
methods? 

2. a) Suppose that each of the m multistep methods (p-, erf) i = 1, ..., m is of 

order p . Prove that the corresponding cyclic method is of order at least p. 

b) Construct a stable, 2 -cyclic, 3 -step linear multistep method of order 5: 
find first a one-parameter family of linear 3-step methods of order 5 
(which are necessarily unstable). 

Result. 

*(0-+*4-i) 

Then determine c x and c 2 , such that the eigenvalues of the matrix S for 
the composite method become 1,0,0. 

3. Prove that the composition of two different general linear methods (with the 
same correct value function) again gives a general linear method. As a conse¬ 
quence, the cyclic methods of Example 8.4 are general linear methods. 
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4. Suppose that all eigenvalues of S (except = 1) lie inside the unit circle. 
Then 

n —1 

\\R\\ e = max \\r n + E VV-II 

^ 0<n<iV 11 n ^ 311 

3=0 

is a minimal stability functional. 

5. Verify for linear multistep methods that the consistency conditions (2.6) are 
equivalent to consistency of order 1 in the sense of Lemma 8.11. 

6. Write method (8.1) as general linear method (8.7) and determine its order (an¬ 
swer: p = 5). 


n —1 

-^E 5 

3=0 


7. Interpret the method of Caira, Costabile & Costabile (1990) 

i-1 


K = h f ( x n + C A Vn + E a H k j 1 + E a ij k j ) 


3 = 1 


y n+1 = y n + E 




as general linear method. Show that, if 

lkp-hy'(x 0 + ( Ci -l)h)\\<C-h?, 




i=l 


i — 1 


E^Vi- 1 ) 91 +E°« 


j=i 


i=i 


<?-l = ^ 

J q 


3 = 1 




then the method is of order at least p. Find parallels of these conditions with 
those of Theorem 8.18. 


8. Jackiewicz & Zennaro (1992) propose the following two-step Runge-Kutta 
method 


i-l i-i 

Y; T 1 = Vn- 1 + E %/(V 1 )’ *7* = Vn + K-lZ E 

3 =1 J =1 

s s 

S/n+1 = y n + E + k n—l^ E W if^ n ), (8.38) 

Z=1 Z=1 

where £ = h n /h n _ 1 . The coefficients t;-, re- may depend on £, but the do 
not. Hence, this method requires 5 function evaluations per step. 

a) Show that the order of method (8.38) is p (according to Definition 8.10) if 
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and only if for all trees t with 1 < Q{t)<p 

e (t) m E«i(y _1 ^)W + Z 0(t) E><g#)> ( 8 - 39 > 

i =1 i=l 

where, as for Runge-Kutta methods, g-(f) = a ij£j(t) • The coeffi¬ 

cients y _1 (t) = (— 1)^) are those of y{x n — h)= B(y~ 1 ,y(x n )). 

b) Under the assumption 


v i + £ p w i = 0 for i = 2,. 

the order conditions (8.39) are equivalent to 


r— 1 




3 = 1 


e= E •?' (j) (~v r ~ j E^E 1 +(i - r~ p ) r&cp 1 , 


i=i 


i =1 


^^(g-(rt) - £(n)cf w) 1 )=0 for 




(8.40) 


(8.41a) 

r = 2, ...,p, 
(8.41b) 

(8.41c) 


c) The conditions (8.41a,b) uniquely define JE w i , JE 1 as functions 
of C>0 (for j = l,...,p-l). 

d) For each continuous Runge-Kutta method of order p — 1 > 2 there exists 
a method (8.38) of order p with the same coefficient matrix (a-) . 

Hints. To obtain (8.41c) subtract equation (8.40) from the same equation where 
t is replaced by the bushy tree of order g{t) . Then proceed by induction. The 
conditions ^E V A~ 1 = fj(0 » 3 — 1? • • • ,P ~ 1, obtained from (c), together 
with (8.41c) have the same structure as the order conditions (order p — 1) of a 
continuous Runge-Kutta method (Theorem II.6.1). 
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The asymptotic expansion of the global error of multistep methods was studied 
in the famous thesis of Gragg (1964). His proof is very technical and can also 
be found in a modified version in the book of Stetter (1973), pp. 234-245. The 
existence of asymptotic expansions for general linear methods was conjectured by 
Skeel (1976). The proof given below (Hairer & Lubich 1984) is based on the ideas 
of Section II.8. 


An Instructive Example 


Let us start with an example in order to understand which kind of asymptotic ex¬ 
pansion may be expected. We consider the simple differential equation 

y' = -y, y(o) = l, 

take a constant step size h and apply the 3-step BDF-formula (1.22’) with one of 
the following three starting procedures: 

y 0 = 1, y 1 = exp(—/i), y 2 = exp(—2h) (exact values) (9.1a) 

h 2 h3 AhS 

2/o = 1 > j/i=l-ft+— - —, y 2 = l-2h + 2h 2 - —, (9.1b) 

h 2 

Vo = 1 > 2/i =1 -ft+y, 2/ 2 = l — 2h+2h 2 . (9.1c) 

The three pictures on the left of Fig. 9.1 (they correspond to the three starting pro¬ 
cedures in the same order) show the global error divided by h 3 for the five step 
sizes h = 1/5,1/10,1 /20,1 /40,1/80. 

For the first two starting procedures we observe uniform convergence to the 
function e 3 (x ) = xe ~ x /4 (cf. formula (2.12)), so that 

y n -y( x n) = e 3 ( x n) h3 +°( h4 )i ( 9 -2) 

valid uniformly for 0 < nh < Const. In the third case we have convergence to 
e 3 (x) = (9 + x)e ~ x /4 (Exercise 2), but this time the convergence is no longer 
uniform. Therefore (9.2) only holds for x n bounded away from x Q , i.e., for 0 < 
a <nh< Const. In the three pictures on the right of Fig. 9.1 the functions 

(: y n - y ( X n )- e 3 ( X n ) h3 )/ h4 


(9.3) 
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Fig. 9.1. The values (y n - y(x n ))/h 3 (left), (y n - y(x n ) ~ es(x n )h 3 )/h 4 (right) 


for the 3 -step BDF method and for three different starting procedures 

are plotted. Convergence to functions e 4 (x) is observed in all cases. Clearly, since 
e 3 (x 0 ) 7 ^ 0 for the starting procedure (9.1c), the sequence (9.3) diverges at x 0 like 
0(l/h) in this case. 

We conclude from this example that for linear multistep methods there is in 
general no asymptotic expansion of the form 

Vn ~ y( X n) = e p( X n) hP + e p+l( X n) hP+1 + • • • 

which holds uniformly for 0 < nh < Const . It will be necessary to add perturbation 
terms 

Vn ~ V( X n ) = ( e p( x n)+ e n) hP + { e p+l i x n) + £ n +1 ) ^ + ■■■ ( 9 - 4 ) 

which compensate the irregularity near x 0 . If the perturbations eh decay exponen¬ 
tially (for n —> oo), then they have no influence on the asymptotic expansion for 
x n bounded away from x 0 . 
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Asymptotic Expansion for Strictly Stable Methods (8.4) 


In order to extend the techniques of Section II. 8 to multistep methods it is useful 
to write them as a “one-step” method in a higher dimensional space (cf. (4.8) and 
Example 8.2). This suggests we study at once the asymptotic expansion for the 
general method (8.4). Because of the presence of £ J n h° in (9.4), the iterative proof 
of Theorem 9.1 below will lead us to increment functions which also depend on n, 
of the form 


<f \(x,u,h) = <f >(x,u + ha n (h),h) + (3 n {h). (9.5) 


We therefore consider for an equidistant grid ( x n ) the numerical procedure 


u 0 = ip(h) 

^n+1 Su n hQ n {x n , u nl /i), 


(9.6) 


where <T n is given by (9.5) and the correct value function is again denoted by 
z(x,h) . The following additional assumptions will simplify the discussion of an 
asymptotic expansion: 


Al) Method (9.6) is strictly stable; i.e., it is stable (Definition 8.8) and 1 is the 
only eigenvalue of S with modulus one. In this case the spectral radius of 
S — E (cf. formula (8.11)) is smaller than 1; 

A2) a n (h) and (3 n (h) are polynomials, whose coefficients decay exponentially 
like O(qq) for n —> oo. Here g 0 denotes some number lying between the 
spectral radius of S — E and one; i.e. g(S — E) < p 0 < 1; 

A3) the functions p, z and <f> are sufficiently differentiable. 


Assumption A3 allows us to expand the local error, defined by (8.9), into a Taylor 
series: 

d n +i = z ( x n + h,h)~ Sz{x n , h) - h$(x n , z(x n , h ) + ha n (h), h ) - h/3 n {h) 

= d 0 (x n ) + d 1 (x n )h+... + d N+1 (x n )h N+1 

~ ^ n ’ z( ' Xn ’ °)’ °) a »^ ~ ~ h/3 n {h) + 0(h N+1 ). 

The expressions involving a n {h) can be simplified further. Indeed, for a smooth 
function G(x) we have 

G(x n )a n (h) = G(x 0 )a n (h) + hG'(x 0 )na n (h) + ... + h N+1 R(n, h). 

We observe that nda n (h) is again a polynomial in h and that its coefficients decay 
like 0(g n ) where g satisfies g 0 < g < 1. The same argument shows the bound¬ 
edness of the remainder R(n , h) for 0 < nh < Const. As a consequence we can 
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write the local error in the form 

d 0 = 7o + h h + ■ ■ ■ + l N h N + 0(h N+1 ) 
d n +l = (do( x n) +^n) + • • • + (^N+l( X n) + ^n +1 )h N+1 + 0{h N+2 ) (9.7) 

for 0 < nh < Const. 

The functions dj(x) are smooth and the perturbations 8n satisfy 8 3 n = 0(g n ). The 
expansion (9.7) is unique, because Sn —> 0 for n —> oo. 

Method (9.6) is called consistent of order p, if the local error (9.7) satisfies 
(Lemma 8.11) 

d n = 0(h p ) for 0 < nh < Const, and Ed p (x) = 0. (9.8) 

Observe that by this definition the perturbations 8 3 n have to vanish for j = 0,..., 
p— 1, but no condition is imposed on 8 P . The exponential decay of these terms 
implies that we still have 

d n+1 + E(d n + ...-+- d Q ) = 0(h p ) for 0 < nh < Const , 

in agreement with Definition 8.10. One can now easily verify that Lemma 8.12 (<b n 
satisfies a Lipschitz condition with the same constant as <I>) and the Convergence 
Theorem 8.13 remain valid for method (9.6). In the following theorem we use, as 
for one-step methods, the notation u h (x) = u n when x = x n . 

Theorem 9.1 (Hairer & Lubich 1984). Let the method (9.6) satisfy A1-A3 and be 
consistent of order p > 1. Then the global error has an asymptotic expansion of 
the form 

u h {x) - z(x, h ) = e p (x)h p + ... + e N (x)h N + E(x, h)h N+1 (9.9) 

where the e-(x) are given in the proof (cf. formula (9.18)) and E(x, h ) is bounded 
uniformly in h G [0, h 0 \ and for x in compact intervals not containing x 0 . More 
precisely than (9.9), there is an expansion 

u n~ z n= ( e P ( x n ) + e v n )h p + ... + (e N (x n ) + Sn)h N + E(n, h)h N+1 (9.10) 

where eh = 0(g n ) with g(S — E) < g < 1 and E(n , h) is bounded for 0 < nh < 
Const. 

Remark. We obtain from (9.10) and (9.9) 

E(x n , h) = E(n, h) + h- 1 ^ + h^e^ 1 +... + h?- N ~'e? n , 

so that the remainder term E(x,h) is in general not uniformly bounded in h for 
x varying in an interval [x 0 , x\. However, if x is bounded away from x 0 , say 
x > x 0 + S (8 > 0 fixed), the sequence e 3 n goes to zero faster than any power of 

8/n < h. 
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Proof, a) As for one-step methods (cf. proof of Theorem 8.1, Chapter II) we 
construct a new method, which has as numerical solution 

U n = U n~ H X n)+ e n) hP C 9 - 11 ) 

for a given smooth function e(x ) and a given sequence e n satisfying e n = 0(g n ). 
Such a method is given by 


u 0 = (p{h ) 

^ (9.12) 

u n+1 = Su n + h<f> n (x n , u n , h) 

where (p{h ) = <p(h) — ( e(x 0 ) + e 0 )h p and 

$ n (x, u, h) = ( x , u + (e(x) + /i) 

- (i e(x + h)~ Se(x - (e ra+1 - S'e„)/i p_1 . 

Since <I> n is of the form (9.5), <f> n is also of this form, so that its local error has 
an expansion (9.7). We shall now determine e(x ) and e n in such a way that the 
method (9.12) is consistent of order p+ 1. 

b) The local error d n of (9.12) can be expanded as 

d 0 =z 0 -u 0 = (j p + e(x 0 )+e 0 )h p + 0(h p+1 ) 
d n+1 = z n+1 - Sz n - h$ n (x n , z n , h) 

= d n+ 1 + ((/ - S)e{x n ) + (e n+1 - Se n ))h p 

+ (~G(x n )(e(x n ) + e n ) + e\x n ))h p+l + 0(h p+2 ). 


Here 


G(x) = ^j^(x,z(x, 0),0) 


which is independent of n by (9.5). The method (9.12) is consistent of order p + 1, 
if (see (9.8)) 

i) £ 0 = -7 p -e(x 0 ), 

ii) d p {x) + (J - S)e{x) +S P + e n+1 -Ss n = 0 for a; = x n , 

iii) Ee'(x ) = EG(x)e(x) — Ed p+1 (x). 

We assume for the moment that the system (i)-(iii) can be solved for e(x) and e n . 
This will actually be demonstrated in part (d) of the proof. By the Convergence 
Theorem 8.13 the method (9.12) is convergent of order p + 1. Hence 

u n — z n — 0{h v+1 ) uniformly for 0 < nh < Const , 

which yields the statement (9.10) for N = p. 

c) The method (9.12) satisfies the assumptions of the theorem with p replaced 
by p+ 1 and g 0 by g. As in Theorem 8.1 (Section II.8) an induction argument 
yields the result. 
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d) It remains to find a solution of the system (i)-(iii). Condition (ii) is satisfied 
if 

(iia) d p (x) = (S-I)(e(x)+c) 

(iib) £ n+1 - c = S(e n - c) - S p 

hold for some constant c. Using (I — S + E)- 1 (I — S) = (I — E), which is a 
consequence of SE = E 2 = E (see (8.11)), formula (iia) is equivalent to 

(I-S + Ey'dpix) = -(/ - E)(e(x) + c). (9.13) 

From (i) we obtain s 0 — c = — 7 p — (e(x 0 ) + c ), so that by (9.13) 

(J - E)(e 0 - c) = -(/ - E) lp +(I-S+ E)~ 1 d p (x 0 ). 

Since Ed p (x 0 ) = 0, this relation is satisfied in particular if 

e 0 -c=-(I- E) lp + (I - S + E)-\{x 0 ). (9.14) 

The numbers s n — c are now determined by the recurrence relation (iib) 

s n -c = S n ( So ~c)-'£S n ~ i %-i 

3 = 1 

oo oo n 

= E(e 0 -c) + (S-E) n (e Q -c)-E^ + E^2 S p - ^(5- , 

j=0 j=n j =1 

where we have used = E 1 + (S' — F;) n . If we put 

oo 

c = e J2 S * (945) 

3=0 

the sequence {e\ defined above satisfies £„ = 0(p n ), since E(£ n — c) = 0 by 
(9.14) and since 5n = 0(g n ). 

In order to find e{x) we define 

v(x) = Ee{x). 

With the help of formulas (9.15) and (9.13) we can recover e{x) from v(x) by 

e(x) = v{x) - (/ - S + E^d^x). (9.16) 

Equation (iii) can now be rewritten as the differential equation 

v\x) = EG{x) Ex) - ( I-S + E)- 1 d p (x)J - Ed p+1 (x), (9.17) 

and condition (i) yields the starting value v(x 0 ) = —E( r ) p + . This initial value 

problem can be solved for v(pc) and we obtain e{pc) by (9.16). This function and 
the e n defined above represent a solution of (i)-(iii). □ 
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Remarks, a) It follows from (9.15)-(9.17) that the principal error term satisfies 

e' p (x) = EG{x)e p (x) - Ed p+1 {x) - (I - S + E^d'^x) 

e p( x o) = ~El p ~ E ^2 dj — (I — S + E)~ 1 d p (x 0 ). (9 ‘ l5) 

3=0 

b) Since e +1 (x) is just the principal error term of method (9.12), it satisfies 

the differential equation (9.18) with d- replaced by c? - +x . By an induction argu¬ 
ment we therefore have for j >p 

e'j(x) = EG(x)ej(x) + inhomogeneity (x). 


Weakly Stable Methods 

We next study the asymptotic expansion for stable methods, which are not strictly 
stable. For example, the explicit mid-point rule (1.13’), treated in connection with 
the GBS-algorithm (Section II.9), is of this type. As at the beginning of this section, 
we apply the mid-point rule to the problem y' = —y, y( 0) = 1 and consider the 
following three starting procedures 


Vo = !> 

Vi = exp (~h) 

(9.19a) 

Vo = !> 

7 

y 1 = l-h+Y 

(9.19b) 

Vo = !> 

y 1 = l-h. 

(9.19c) 


The three pictures on the left of Fig. 9.2 show the global error divided by h 2 . For 
the first two starting procedures we have convergence to the function xe~ x /6, 
while for (9.19c) the divided error (y n — y(x n ))/h 2 converges to 

for n even, 

for n odd. 

We then subtract the h 2 -term from the global error and divide by h 3 in the case 
(9.19a) and by /i 4 for (b) and (c). The result is plotted in the pictures on the right 
of Fig. 9.2. 

This example nicely illustrates the fact that we no longer have an asymptotic 
expansion of the form (9.9) or (9.10) but that there exists one expansion for x n 
with n even, and a different expansion for x n with n odd (see also Exercise 2 of 
Section II.9). Similar results for more general methods will be obtained here. 

We say that a method of the form (8.4) is weakly stable, if it is stable, but if 
the matrix S has, besides Ci = 1, further eigenvalues of modulus 1, say C 2 , • • •, Ci • 
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Fig. 9.2. Asymptotic expansion of the mid-point rule 
(three different starting procedures) 


The matrix S therefore has the representation (cf. (8.11)) 

S = ( 1 E 1 + ( 2 E 2 + ...+( l E l +R (9.20) 

where the E ■ are the projectors (corresponding to £ •) and the spectral radius of R 
satisfies g(R) < 1. 

In what follows we restrict ourselves to the case where all Cj (j = 1,..., l) 
are roots of unity. This allows a simple proof for the existence of an asymptotic 
expansion and is at the same time by far the most important special case. For the 
general situation we refer to Hairer & Lubich (1984). 

Theorem 9.2. Let the method (9.6) with <b n independent of n be stable, consistent 
of order p and satisfy A3. If all eigenvalues (of S) of modulus 1 satisfy (fj = 1 
(j = 1,..., l) for some positive integer q, then we have an asymptotic expansion 
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of the form (u = e 27 ™/^) 
q -1 

U n~ Z n = \ e ps( X n) hP + • • • + e Ns( x n) hN ) + E ( n > h)h N+1 (9.21) 

s=0 

where the e- s {x) are smooth functions and E(n , h) is uniformly bounded for 0 < 
5 <nh< Const. 

Proof The essential idea of the proof is to consider q consecutive steps of method 
(9.6) as one method over a large step. Putting u n = u nq+i (0 < i < q — 1 fixed), 

h — qh and x n = x i J r nh, this method becomes 

U n+l = S q u n + //<!>(.?„. u n , h) (9.22) 

with a suitably chosen <f>. E.g., for q = 2 we have 

The assumption on the eigenvalues implies 

S q = E 1 + ... + E l + R q 

so that (9.22) is seen to be a strictly stable method. A straightforward calculation 
shows that the local error of (9.22) satisfies 

3 =om 

<+i = (I + S + ... + S q ~ l )d p (x n )h p + 0(h p+1 ). 

Inserting (9.20) and using = 1 we obtain, with E = E t + ... + E t , 

S(J + 5 + ... + 5‘ ? - 1 )(( p (x) 

E i + E = 

j=2 ^ j=1 

which vanishes by (8.15). Hence, also method (9.22) is consistent of order p. All 
the assumptions of Theorem 9.1 are thus verified for method (9.22). We therefore 
obtain 

u nq+i - z nq+i = e p i( x nq+i) hP + • • • + e m (.Xng +i )h N + E^n, h)h N+1 

where E^n, h ) has the desired boundedness properties. If we define e - s (x) as a 
solution of the Vandermonde-type system 

q -1 

Ew”e J - 8 (*) = e #i (a;) 

s=0 


we obtain (9.21). 


□ 
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The Adjoint Method 


(9.23) 


For a method (8.4) the correct value function z(x,h), the starting procedure ip(h) 
and the increment function $(x,u,h) are usually also defined for negative h (see 
the examples of Section III.8). As for one-step methods (Section II.8) we shall give 
here a precise meaning to the numerical solution u h (pc) for negative h. This then 
leads in a natural way to the study of asymptotic expansions in even powers of h . 

With the notation u h (x) = u n for x = x 0 + nh (h > 0) the method (8.4) be¬ 
comes 

u h( x o) = ( P( h ) 

u h {x + h) = Su h {x ) + u h (x), h ) for x = x Q + nh. 

We first replace h by —h in (9.23) to obtain 

u_ h (x 0 ) = <p(-h) 

u_ h (x — h)= Su_ h (x) — h$(x, u_ h (x), —h) 
and then x by x + h which gives 
u_ h (x 0 ) = ip(-h) 

u_ h (x ) = Su_ h (x + h) — h$(x + h, u_ h (x + h),—h). 

For sufficiently small h this equation can be solved for u_ h (x + h) (Implicit Func¬ 
tion Theorem) and we obtain 

U_ h (x 0 ) = <p(-h), 

u_ h (x + h) = S~ 1 u_ h {x) + h$* (x, u_ h (x), h). 

The method (9.24), which is again of the form (8.4), is called the adjoint method 
of (9.23). Its correct value function is z* (x, h ) = z(x, —h ). Observe that for given 
S and the new increment function <F* is just defined by the pair of formulas 

v = Su — h<fr(x + h. u, —h ) 
u = S~ 1 v + h<f>*(x,v,h). 


(9.25) 


Example 9.3. Consider a linear multistep method with generating functions 

k k 

Q(C) = ^2a j C J , a(() = J2Pj( J - 


j =0 


3=0 


Then we have 



(~ a k-l/ a k ~ a k-2/ a k 

1 0 

.. . -a 0 /a k \ 
0 


/ 1 \ 

s= 

1 

0 

, $>(x,u,h) = 

0 


V 

1 0 j 


Vo/ 


ip(x, u, h ) 
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where = f>(x, u, ft) is the solution of (u = (u k _ l3 ..., , u 0 ) T ) 


k -1 


fe-i 


a k ip = u i) + Pkf 0 + kh » hip-^2 ~ u j) ■ 

3=0 3=0 “ fc 

A straightforward use of the formulas (9.25) shows that 

j \ /n 




/ 0 
0 


1 


<L*(;r, u, ft) = 


0 

Vi/ 


'0*(x, ft) 


\~ a k/ a 0 ~ a k-i/ a 0 ••• ~ a l/ a oJ 
where 0* = 0*(x, ft) (with v = (u 0 ,..., v k _ 1 ) T ) is given by 

k -i fc-1^ 

-«o^* = X] Pk-jf( x + (j -k + l)h,Vj) + P 0 f [x + h, hip* - ~f Lv j) ■ 

j=o j=o a ° 

This shows that the adjoint method is again a linear multistep method. Its generat¬ 
ing polynomials are 


q*(0 = -Cq(C 1 ), 


r *(C) = CMC ) 


(9.26) 


Our next aim is to prove that the adjoint method has exactly the same asymp¬ 
totic expansion as the original method, with ft replaced by —ft. For this it is 
necessary that S' -1 also be a stable matrix. Therefore all eigenvalues of S must 
lie on the unit circle. 

Theorem 9.4. Let the method (9.23) be stable, consistent of order p and assume 
that all eigenvalues of S satisfy ( q = 1 for some positive integer q. Then the global 
error has an asymptotic expansion of the form (uj = e 27 ™/^) 
q -1 

«fc(®») - z ( x n> h )=Yl ^ \ e ps( X n) hP + • • ■ + e Ns (*n)^) + E ( x n> k )h N+1 , 
s =0 

(9.27) 

valid for positive and negative ft. The remainder E(x,h) is uniformly bounded 
for |ft| < ft 0 and x 0 < x < x. 

Proof As in the proof of Theorem 9.2 we consider q consecutive steps of method 
(9.23) as one new method. The assumption on the eigenvalues implies that S q = 
I = identity. Therefore the new method is essentially a one-step method. The only 
difference is that here the starting procedure and the correct value function may 
depend on ft. A straightforward extension of Theorem 8.5 of Chapter II (Exercise 
3) implies the existence of an expansion 

u h( X nq+i) ~ Z ( X nq+V h ) = « 'pi( x nq+i) hP + • • • + «Ni ( X nq+i) hN 

+ Ei{x nq+i ,h)h N+1 . 
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This expansion is valid for positive and negative h ; the remainder E^x^h) is 
bounded for \h\ < h 0 and x 0 < x < x. The same argument as in the proof of 
Theorem 9.2 now leads to the desired expansion. □ 


Symmetric Methods 


The definition of symmetry for general linear methods is not as straightforward as 
for one-step methods. In Example 9.3 we saw that the components of the numerical 
solution of the adjoint method are in inverse order. Therefore, it is too restrictive to 
require that p{h) = ip(—h ), S = S' -1 and = <E>* . 

However, for many methods of practical interest the correct value function 
satisfies a symmetry relation of the form 

z(x, h ) = Qz(x + qh , —h) (9.28) 


where Q is a square matrix and q an integer. This is for instance the case for linear 
multistep methods, where the correct value function is given by 

z(x, h ) = (y(x +(k- 1 )h),y(x)) T . 

The relation (9.28) holds with 


Q = 



and q — k— 1. 


(9.29) 


Definition 9.5. Suppose that the correct value function satisfies (9.28). Method 
(9.23) is called symmetric (with respect to (9.28)), if the numerical solution satisfies 
its analogue 

u h {x) = Qu_ h (x + qh). (9.30) 

Example 9.6. Consider a linear multistep method and suppose that the generating 
polynomials of the adjoint method (9.26) satisfy 

e*(0 = Q(0, a*(C)=a(C). (9.31) 

This is equivalent to the requirement (cf. (3.24)) 

a k-j = ~ OL j’> Pk-j = Pj- 

A straightforward calculation (using the formulas of Example 9.3) then shows that 
the symmetry relation (9.30) holds for all x = x 0 + nh whenever it holds for x = 
x 0 . This imposes an additional condition on the starting procedure ip(h ). 

Let us finally demonstrate how Theorem 9.4 can be used to prove asymptotic 
expansions in even powers of h. Denote by u° h (x) the j th component of u h (x ). 
The symmetry relation (9.30) for multistep methods then implies 

u k _ h (x) = u 1 h (x-(k-l)h) 
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Furthermore, for any multistep method we have 

u h( x )= u i{ x - ( k - l ) h ) 

so that 

U k h (x) = u k _ h (x) 

for symmetric methods. As a consequence of Theorem 9.4 the asymptotic expan¬ 
sion of the global error is in even powers of h , whenever the multistep method is 
symmetric in the sense of Definition 9.5. 

Exercises 

1. Consider a strictly stable, pth order, linear multistep method written in the 
form (9.6) (see Example 9.3) and set 

G{x) = ^(x,z(x, 0),0). 

a) Prove that 

EG(x)l 1= l^(x,y(x)) 

where E is the matrix given by (8.11) and 11 = (1,..., 1) T . 

b) Show that the function e p (x ) in the expansion (9.9) is given by e p (x ) = 
le p (x), where 

e' p ( x )= ^{x,y(x))e p (x) - Cy {p+1) (x) 

and C is the error constant (cf. (2.13)). Compute also e p (x 0 ). 

2. For the 3-step BDF-method, applied to y' = —y, y( 0) = 1 with starting pro¬ 
cedure (9.1c), compute the function e 3 (x) and the perturbations {e ^} n > 0 in 
the expansion (9.4). Compare your result with Fig. 9.1. 

3. Consider the method 

u 0 = <fi(h), u n+1 =u n + h$(x n ,u n ,h) (9.32) 

with correct value function z(x, h ). 

a) Prove that the global error has an asymptotic expansion of the form 

u n~ z n= e p (x n )h p + ... + e N {x n )h N + E(x n , h)h N+1 
where E(x,h) is uniformly bounded for 0 < h < h 0 and x 0 < x < x . 

b) Show that Theorem 8.5 of Chapter II remains valid for method (9.32). 



III .10 Multistep Methods for Second Order 
Differential Equations 


En 1904 j’eus besoin d’une pareille methode pour calculer les tra- 
jectoires des corpuscules electrises dans un champ magnetique, et 
en essay ant diverses methodes deja connues, mais sans les trouver 
assez commodes pour mon but, je fus conduit moi-meme a elaborer 
une methode assez simple, dont je me suis servi ensuite. 

(C. Stormer 1921) 


Because of their importance, second order differential equations deserve some ad¬ 
ditional attention. We already saw in Section 11.14 that for special second order 
differential equations certain direct one-step methods are more efficient than the 
classical Runge-Kutta methods. We now investigate whether a similar situation 
also holds for multistep methods. 

Consider the second order differential equation 

y" = f(x,y,y') (10.1) 


where y is allowed to be a vector. We rewrite (10.1) in the usual way as a first 
order system and apply a multistep method 

k k 


^ ^ ^iVn+i h ^ > ftiVn+i 

i =0 i =0 

k k 


( 10 . 2 ) 


y ^ ^iVn+i h y ^ n-\-i"> Vn+ii Vn+i) • 

i =0 i =0 

If the right hand side of the differential equation does not depend on y ', 


y " = f(x,y), 


(10.3) 


it is natural to look for numerical methods which do not involve the first derivative. 
An elimination of {y' n } in the equations (10.2) results in 

2k 2k 

J2 V-iVn+i = h2 Yl Pif( X n+i, Vn+i) ( 10 - 4 ) 

i =0 i =0 

where the new coefficients S-, P i are given by 

2k k 2 2/c k 2 

E = (E a iC )» E ft? = (E Pi?) • ( 10 - 5 > 

i= 0 i— 0 z=0 z=0 

In what follows we investigate (10.4) with coefficients that do not necessarily sat¬ 
isfy (10.5). It is hoped to achieve the same order with a smaller step number. 
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Explicit Stormer Methods 

Sein Vortrag ist iibrigens ziemlich trocken und langweilig ... 

(B. Riemann’s opinion about Encke, 1847) 

Had the Ast. Ges. Essay been entirely free from numerical blun¬ 
ders, ... (RH. Cowell & A.C.D. Crommelin 1910) 

Since most differential equations of celestial mechanics are of the form (10.3) it is 
not surprising that the first attempts at developing special methods for these equa¬ 
tions were made by astronomers. 

For his extensive numerical calculations concerning the aurora borealis (see 
below), C. Stormer (1907) developed an accurate and simple method as follows: 
by adding the Taylor series for y{x n + h) and y{x n — h ) we obtain 

h 6 

y(x n + h)~ 2 y(x n ) + y{x n - h) = h 2 y"(x n ) + — y {4 \x n ) + —y (6) OJ + • • • • 

If we insert y"(x n ) from the differential equation (10.3) and neglect higher terms, 
we get 

^n+l ^ 2 /n Vn—l ^ fn 

as a first simple method, which is sometimes called Stormer’s or Encke’s method. 
For greater precision, we replace the higher derivatives of y by central differences 

off 

h 2 y (i \x n ) = A 2 /„_i - Ta 4 /„_ 2 + ... 
h 4 y {6) (x n ) = A 4 f n _ 2 + ... 

and obtain 

Vn+l ~ 2 Vn + y n -i = h 2 (/„ + T A 2 / n _! - A 4 / n _ 2 + ...). (10.6) 

This formula is not yet very practical, since the differences of the right hand side 
contain the unknown expressions / n+1 and / n+2 . Neglecting fifth-order dif- 
ferences (i.e., putting A 4 / n _ 2 ~ A 4 / n _ 4 and A 2 f n _ 1 = A 2 f n _ 2 + A 3 f n _ 3 + 
A 4 /„_ 3 » A 2 f n _2 + A 3 / n _ 3 + A 4 / n _ 4 ) one gets 

h 2 / 19 \ 

Vn+l - + Vn-1 = ^fn + \& 2 f n _ 2 + A 3 / n _ 3 + — A 4 / n _ 4 J (10.7) 

(“... formule qui est fondamentale dans notre methode ...”, C. Stormer 1907). 

Some years later Cowell & Crommelin (1910) used the same ideas to inves¬ 
tigate the motion of Halley’s comet. They considered one additional term in the 
series ( 10 . 6 ), namely 

31 a 6 f ~ 1 A 6 f 

60480 Jn ~ 3 1951 Jn ~ 3 ' 
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Arbitrary orders. Integrating equation (10.3) twice we obtain 

y{x + ft) = y(x) + hy'(x) + h 2 f (1 — s)f (x + sft, + sft)) ds. (10.8) 

Jo 

In order to eliminate the first derivative of y(x) we write the same formula with ft 
replaced by — ft and add the two expressions: 

y{x + ft) - 2 y{x) + 2 /(x - ft) (10.9) 

# A’ /'(l -»)(/(* + ,(* + ./.))+/(»- A »(*-»<>))) 

As in the derivation of the Adams formulas (Section III. 1) we replace the unknown 
function f(t,y(t )) by the interpolation polynomial p(t) of formula (1.4). This 
yields the explicit method 

k -1 

J/n+i - 2y n + = h 2 (10.10) 

j=0 

with coefficients o- given by 

(i - a) ((7) + G))*- <1011) 

See Table 10.1 for their numerical values and Exercise 2 for their computation. 


Table 10.1. Coefficients of the method (10.10) 


3 0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

/T . 1 

n 

1 

1 

19 

3 

863 

275 

33953 

8183 

CTj I 

V 

12 

12 

240 

40 

12096 

4032 

518400 

129600 


Special cases of (10.10) are 
k = 2 \ y n +i — 2 y n + y n _ 1 = ft f n 

k = 3 : y n +l- 2 yn+yn-l= h2 (^fn-\fn-l + -^ i fn-‘2) ( 10 . 10 ’) 

k = 4 : y n +1 - 2 l/n + tfn-1, = ^ (Ifn - J^fn-l + l-fn-2 ~ ^/n-a) • 

Method (10.10) with k = 5 is formula (10.7), the method used by Stormer (1907, 
1921), and for k = 6 one obtains the method used by Cowell & Crommelin (1910). 
The simplest of these methods (ft = 1 or k = 2) has been successfully applied as 
the basis of an extrapolation method (Section 11.14, formula (14.32)). 
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Implicit Stormer Methods 


The first terms of (10.6) 

Vn+l ~ 2 Vn + Vn-1 = h * (fn + fn- 1) 


h 2 

(fn *• 1 + 10 /n + fn- 1) 


( 10 . 12 ) 


form an implicit equation for y n+1 . This can either be used in a predictor-corrector 
fashion, or, as advocated by B. Numerov (1924, 1927), by solving this implicit 
nonlinear equation directly for y n+1 . 

To obtain more accurate formulas, analogous to the implicit Adams methods, 
we use the interpolation polynomial p*(t) of (1.8), which passes through the addi¬ 
tional point (x n+1 , f n+1 ). This yields the implicit method 


Un +1 - 2y n + y n -i = h 2 J2 a J VJ fn+l’ 
j =0 

where the coefficients cr* are defined by 

-s + l\ /s + 1 
/o '\ 3 ) \ 3 

and are given in Table 10.2 for 3 < 9. 


= »- 


(-iy J\i- S )( 


r)) 


ds 


(10.13) 


(10.14) 


Table 10.2. Coefficients of the implicit method (10.13) 



0 

1 2 

3 

4 

5 

6 

7 

8 

9 

n-* 

1 

■ ^ 

0 

-1 

-1 

-221 

-19 

-9829 

-407 


240 

240 

60480 

6048 

3628800 

172800 


Further methods can be derived by using the ideas of Nystrom and Milne for 
first order equations. With the substitutions h —> 2ft, 2s and x^x — h formula 
(10.9) becomes 

y(x + ft) — 2y(x — ft) + t/(x — 3/i) = h 2 f (2 — s) (10.15) 

Jo 

■ (/(a: + (s-l)ft, y(x + (s-l)ft)) +/(»# (s+l)/i,t/(x - (s+l)/i))) ds. 

If one replaces /(f, t/(£)) by the polynomial p(f) (respectively p*(f)) one obtains 
the new classes of explicit (respectively implicit) methods. 
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Numerical Example 

Nous avons calcule plus de 120 trajectoires differentes, travail im¬ 
mense qui a exige plus de 4500 heures ... Quand on est suffisam- 
ment exerce, on calcule environ trois points (R, z) par heure. 

(C. Stormer 1907) 


We choose the historical problem treated by Stormer in 1907: Stormer’s aim was to 
confirm numerically the conjecture of Birkeland, who explained in 1896 the aurora 
borealis as being produced by electrical particles emanating from the sun and danc¬ 
ing in the earth’s magnetic field. Suppose that an elementary magnet is situated at 
the origin with its axis along to the z-axis. The trajectory (x(s),y(s), z(s)) of an 
electrical particle in this magnetic field then satisfies 

x" = T (3 yzz' - (3z 2 - r 2 )y’) 

y" = 2-(( 3^2 —r 2 )x' — 3xzz') (10.16) 

z" — — ( 3xzy' — 3yzx ') 

where r 2 = x 2 + y 2 + z 2 . Introducing the polar coordinates 

x = Rcosf , y = Rsiiup (10.17) 


the system (10.16) becomes equivalent to 


R = 




27 

727 

R r 3 ; 

'Vi ? 2 

27 

, 3 Rz 

R r 3 ; 

I r 5 

27 R > 
~R + r 

7 

' R 


3i ? 2 _ 1_\ 

y 5 y3 J 


(10.18a) 

(10.18b) 

(10.18c) 


where now r 2 = R 2 + z 2 and 7 is some constant arising from the integration of 
if". The two equations (10.18a,b) constitute a second order differential equation 
of type (10.3), which can be solved numerically by the methods of this section. 
if is then obtained by simple integration of (10.18c). Stormer found after long 
calculations that the initial values 


R 0 = 0.257453, z 0 = 0.314687, 7 = -0.5, 

^0 = \/Qo cos z o = \/Qo sinu > u = 5?r/4 (10.18d) 

r 0 = v^f+7, Qo = 1 “ ( 27/^0 + ^oAo) 2 

produce a specially interesting solution curve approaching very closely the North 
Pole. Fig. 10.1 shows 125 solution curves (in the x, y, z- space) with these and 
neighbouring initial values to give an impression of how an aurora borealis comes 
into being. 
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explicit (10.10), k- 2,...,5, - explicit Adams, k= 1,...,5, 

implicit (PECE), k = 2,..., 5. - implicit (PECE), k = 0,...,5. 


Fig. 10.2. Performance of Stormer and Adams methods 

Fig. 10.2 compares the performance of the Stormer methods (10.10) and 
(10.13) (in PECE mode) with that of the Adams methods by integrating subsys¬ 
tem (10.18a,b) with initial values (10.18d) for 0 < s < 0.3. The diagrams com¬ 
pare the Euclidean norm in R 2 of the error of the final solution point (R, z) 
with the number of function evaluations fe . The step numbers used are {n — 
50 • 2 0 - 3 *}- =0 x 30 = {50,61, 75,93,114,..., 25600}. The starting values were 
computed very precisely with an explicit Runge-Kutta method and step size h RK = 
/i/10. It can be observed that the Stormer methods are substantially more precise 
due to the smaller error constants (compare Tables 10.1 and 10.2 with Tables 1.1 
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and 1.2). In addition, they have lower overhead. However, they must be imple¬ 
mented carefully in order to avoid rounding errors (see below). 


General Formulation 

Our next aim is to study stability, consistency and convergence of general linear 
multi step methods for (10.3). We write them in the form 

k k 

Y a iVn+i = h2 Y Pif( X n+i’ Vn+i )' ( 10 - 19 ) 

i =0 i =0 

The generating polynomials of the coefficients and f3 i are again denoted by 

k k 

Q (Q = Y* i C, a(C)=y>C- ( 10 . 20 ) 

i =0 i =0 

If we apply method (10.19) to the initial value problem 

y" = f(x,y), y(x 0 )=y 0 , y'(x 0 ) = y' 0 (10.21) 

it is natural to require that the starting values be consistent with both initial values, 
i.e., that 

y_i - Vo_Z _ 0 for ft-* o, * = 0,1,..., As — 1. (10.22) 

h 

For the stability condition of method (10.19) we consider the simple problem 

y — Vo = V{) — o. 

Its numerical solution satisfies a linear difference equation with g(Q as character¬ 
istic polynomial. The same considerations as in the proof of Theorem 4.2 show 
that the following stability condition is necessary for convergence. 

Definition 10.1. Method (10.19) is called stable , if the generating polynomial g(() 
satisfies: 

i) The roots of g(Q lie on or within the unit circle; 

ii) The multiplicity of the roots on the unit circle is at most two. 

For the order conditions we introduce, similarly to formula (2.3), the linear 
difference operator 

L(y, x, h) = g{E)y(x) - h 2 a(E)y"(x) 
k 

= Y ( a iV( x + ih ) - h2 Piy"{x + ih )), 

i =0 

where E is the shift operator. As in Definition 2.3 we now have: 


(10.23) 
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Definition 10 . 2 . Method (10.19) is consistent of order p if for all sufficiently 
smooth functions y(x), 

L(y,x,h) = 0{h p+2 ). (10.24) 

The following theorem is then proved similarly to Theorem 2.4. 

Theorem 10 . 3 . The multistep method (10.19) is of order p if and only if the fol¬ 
lowing equivalent conditions hold: 

0 E,*= 0 a i = °> E?=o ia i = o 

and EiU a i iq = viv -!) Ei= 0 Pi iq ~ 2 for q = 2 , • • •, p + 1 . 

ii) g{e h )-h 2 a(e h ) = 0(h p+2 ) forh^O, 

Ui > ~ a (0 = °((C - !) P ) far 1 . 

□ 


As for Adams methods one easily verifies that the method (10.10) is of order 
k , and that (10.13) is of order k + 1. 

The following order barriers are similar to those of Theorems 3.5 and 3.9; their 
proofs are similar too (see, e.g., Dahlquist 1959, Henrici 1962): 

Theorem 10 . 4 . The order p of a stable linear multistep method (10.19) satisfies 

p < k + 2 if k is even, 

p < k + 1 ifk is odd. □ 

Theorem 10 . 5 . Stable multistep methods (10.19) of order k + 2 are symmetric, 
i. e., 

Oij = Oi k -j, Pj = P k -j for all j. □ 


Convergence 

Theorem 10.6. Suppose that method (10.19) is stable, of order p, and that the 
starting values satisfy 

y( x j)-yj = °( hP+1 ) for j = 0,1,..., fc — 1. (10.25) 

Then we have convergence of order p, i.e., 

\\y(x n ) — y n \\ < Ch p for 0 < hn < Const. 
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Proof. It is possible to develop a theory analogous to that of Sections III.2 - III.4. 
This is due to Dahlquist (1959) and can also be found in the book of Henrici (1962). 
We prefer to rewrite (10.19) in a one-step formulation of the form (8.4) and to 
apply directly the results of Section III .8 and III.9 (see Example 8 . 6 ). In order 
to achieve this goal, we could put u n = (t/ n+fc _ l5 ..., y n ) T , which seems to be a 
natural choice. But then the corresponding matrix S does not satisfy the stability 
condition of Definition 8.8 because of the double roots of modulus 1. To overcome 
this difficulty we separate these roots. We split the characteristic polynomial g(Q 
into 

<?(C)# 0 i(C)-f? 2 (C) (10.26) 

such that each polynomial (l + k = m ) 

l m 

£?i(o=y>id, 2 2 (o=y>^ ( io - 27 > 

i— 0 z=0 

has only simple roots of modulus 1. Without loss of generality we assume in the 
sequel that m > l and a k = r ) l = = 1. Using the shift operator E , method 

(10.19) can be written as 

g(E)y n = h 2 a(E)f n . 

The main idea is to introduce g 2 (E)y n as a new variable, say hv n , so that the 
multistep formula becomes equivalent to the system 

Qi ( E )v n =ha(E)f n , Q 2 {E)y n = hv n . (10.28) 
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In this formula ^ is written as a function of x n , (y n + k _ 1 ,..., y n ) and h. But 
the second relation of (10.28) shows that each value y n+k _ 1: ..., y n+rn can be 


expressed as a linear combination of the elements of u n . Therefore ^ is in fact a 
function of (x n , u n ,h). 

Formula (10.29a) defines our forward step procedure. The corresponding start¬ 
ing procedure is 

<p( h ) = («,_!, • • •, V 0 , y m _ 1 , y 0 ) T (10.29b) 

which, by (10.28), is uniquely determined by (%_ l5 ..., y 0 ) T . As correct value 
function we have 

/ 1 1 \T 

z(x, h) = ^-g 2 (E)y(x+(l-l)h),-g 2 (E)y(x), y(x+(m-l)h,y{x)J . 

(10.29c) 

By our choice of p x (C) and g 2 {C) (both have only simple roots of modulus 1) the 
matrices G and K are power bounded. Therefore S' is also power bounded and 
method (10.29) is stable in the sense of Definition 8 . 8 . 

We now verify the conditions of Definition 8.10 and for this start with the error 
in the initial values 

d 0 = z(x 0 ,h)-<p{h). 

The first l components of this vector are 
1 1 m 

-g 2 (E)y( Xj ) ~Vj = T- K M x i+j) ~ %+i)> j = 0,1 

i =0 

and the last m components are just 

y{xj)-yj, j = 0,..., to - 1. 

Thus hypothesis (10.25) ensures that d 0 = 0(h p ). Consider next the local error at 

x n , 

d n+1 = z(x n + h, h)-Sz(x n ,h)-h<S>(x n ,z(x n ,h),h). 

All components of d n+1 vanish except the first, which equals 

4+1 = j^e( E )y( x n) ~ W( x rv z ( x n, h )> h )' 

Using formula (10.31), an application of the mean value theorem yields 

4+1 = \ L (Vi x m h ) + h2 Pkf( x n+ki V) ■ 4+1 (10.32) 

with r] as in Lemma 2.2. We therefore have 

d n+ 1 = 0{h p+1 ) since L(y, x n , h ) = 0{h p+2 ). 

Finally Theorem 8.13 yields the stated convergence result. □ 
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Asymptotic Formula for the Global Error 


Assume that the method (10.19) is stable and consistent of order p. The local 
truncation error of (10.29) is then given by 


d n+1 = e 1 h p+1 C p+2 y^ +2 \x n ) + 0(h p+2 ) (10.33) 


with 

c '+ 2 - 5T2ji E(«.‘ rM - (p+ 2 Kj>+ IW)- 


Formula (10.33) can be verified by developing L(y , x n , h ) into a Taylor series in 
(10.32). An application of Theorem 9.1 (if 1 is the only root of modulus 1 of g (()) 
or of Theorem 9.2 shows that the global error of method (10.29) is of the form 

u h (x) — z(x, h ) = e{x)h p + 0(h p+1 ) 

where e(x) is the solution of 

e'(x) = E^-(x, z(x, 0), 0)e(x) - Ee 1 • C p+2 y {p+2) (x). (10.34) 

Here E is the matrix defined in (8.12). Since no h p - term is present in the local 
error (10.33), it follows from (9.16) that e(x) = Ee(x) . Therefore (see Exercise 
4a) this function can be written as 


e(x) 


7(x)l 

k(x)1 


A straightforward calculation of ^ (x, z{x , 0), 0) and Ee 1 (for details see Exer¬ 
cise 4) shows that (10.34) becomes equivalent to the system 

y 


yw = ~m v<r+2)(x ' 1 


1 


‘ ,{x)= 


(10.35a) 

(10.35b) 


Differentiating (10.35b) and inserting Y(x) from (10.35a), we finally obtain 

k!'(x) = ^ (x, y(x))K,(x) — Cy (j)+2 \x) (10.36) 

with 

(10.37) 


'p+2 

'(I)’ 


Here we have used the relation cr(l) = ^(1) • ^(1) > which is an immediate con¬ 
sequence of (10.26), and the assumption that the order of the method is at least 1. 
The constant C in (10.37) is called the error constant of method (10.19). It plays 
the same role as (2.13) for first order equations. 
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Since the last component of the vector u n is y n we have the desired result 

Vn ~ V( X n ) = K(x n )h p + 0(h P+1 ) 

with k,(x) satisfying (10.36). Further terms in the asymptotic expansion of the 
global error can also be obtained by specializing the results of III.9. 


Rounding Errors 

A direct implementation of Stormer’s methods, for which (10.19) specializes to 

k 

y n+1 -2y n + y n _ 1 =h 2 ^2f3J n+i _ k+1 , (10.38) 

i =0 

by storing the y -values y 0 , t/ l5 ..., y k -± and computing successively the values 
y k , y k + 1 , • • • with the help of (10.38) leads to numerical instabilities for small h. 
This instability is caused by the double root of on the unit circle. It can be 
observed numerically in Fig. 10.3, where the left picture is a zoom of Fig. 10.2, 
while the right image contains the results of a code implementing (10.38) directly. 



In order to obtain the stabilized version of the algorithm, we apply the follow¬ 
ing two ideas: 

a) Split, as in (10.26), the polynomial g(() as (( — 1)(£ — 1). Then (10.28) leads 
to hv n = y n+ i — y n and (10.38) becomes the mathematically equivalent for¬ 
mulation 

k 

v n -v n _ 1 =h'Y J [3 i f n+i _ k+1 , y n +i-yn = hv n • (10.38’) 

i =0 
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Here the corresponding matrix S of (10.30) is stable. 

b) Avoid the use of v n = (y n+1 —y n )/h for the computation of the starting val¬ 
ues v 0 , v 1: ..., v k _ 2 , since the difference is a numerically unstable operation. 
Instead, add up the increments of the Runge-Kutta method, which you use for 
the computation of the starting values, directly. 

These two ideas together then produce the “stabilized” results in Fig. 10.3 and 
Fig. 10.2. 


Exercises 


1. Compute the solution of Stormer’s problem (10.18) with one of the methods of 
this section. 

2. a) Show that the generating functions of the coefficients a i and <r* (defined 

in (10.11) and (10.14)) 


satisfy 


s(t) = 

3=0 

> 2 1 


3=0 


—f s ‘ (i) = (isi(T^) 


dog(l -i) 
b) Compute the coefficients d- of 


3=0 


/log(l -t)\ 2 


t t 2 t 3 

= ll + 2 + 3 + 7 


) =( 


and derive a recurrence relation for the a - and cr*. 


c) Prove that <r* = a - — cr J _ 1 . 

3. Let be a polynomial of degree k which has 1 as root of multiplicity 2. 
Then there exists a unique a(() such that the corresponding method is of order 

fc + 1. 


4. Consider the method (10.29) and, for simplicity, assume the differential equa¬ 
tion to be a scalar one. 

a) For any vector w in R k the image vector Ew, with E given by (8.12), 
satisfies 

7I 


Ew = 




where 7, k are real numbers and 11 is the vector with all elements equal to 
1. The dimensions of 7 11 and ft 11 are l and m, respectively. 
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b) Verify that for e x 





(«M(i))G 


c) Show that 

E d *( x z (x 0) 0^ ( <T ( 1 )/^i W)^f/dy){x,y{x))K\ 

* du z[x ’ U) ’ Uj 1, k1 J “ 1, (1M(1)) 7 1 

Hint. With Y n — (y n + k _ u ..., y n ) T the formula (10.31) expresses ^ as 
a function of (x n , Y n ,h). The second formula of (10.28) relates Y n and 

u n as 

KY n = Lu n + 0(h) where KA = L 

and K is invertible. Use the chain rule for the computation of d^/du. 
See also Exercise 2 of Section III.4 and Exercise 1 of Section III.9. 



5. Compute the error constant (10.37) for the methods (10.10) and (10.13). 
Result. a k and cr k+1 , respectively. 
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... but the software is in various states of development from ex¬ 
perimental (a euphemism for badly written) to what we might call 

(C.W. Gear, in Aiken 1985) 


Several Fortran codes have been developed for our numerical computations. Those 
of the first edition have been improved and several new options have been included, 
e.g., automatic choice of initial step size, stiffness detection, dense output. We have 
seen many of the ideas, which are incorporated in these codes, in the programs of 
P. Deuflhard, A.C. Hindmarsh and L.F. Shampine. 

Experiences with all of our codes are welcome. The programs can be obtained 
from the authors’ homepage (http://www.unige.ch/^hairer). 

Address: Section de Mathematiques, Case postale 240, 

CH-1211 Geneve 24, Switzerland 

E-mail: Ernst.Flairer@math.unige.ch Gerhard.Wanner@math.unige.ch 


Driver for the Code DOPRI5 


The driver given here is for the differential equation (II.0.1) with initial values and 
x end gi yen in (II.0.2). This is the problem AREN of Section II. 10. The subroutine 
FAREN (“F for AREN”) computes the right-hand side of this differential equation. 
The subroutine SOLOUT (“Solution out”), which is called by DOPRI5 after every 
successful step, and the dense output routine CONTD5 are used to print the solution 
at equidistant points. The (optional) common block STATD5 gives statistical infor¬ 
mation after the call to DOPRI5. The common blocks COD5R and COD5I transfer 
the necessary information to CONTD5. 


IMPLICIT REAL*8 (A-H,0-Z) 

PARAMETER (NDGL=4,LW0RK=8*NDGL+10,LIW0RK=10) 

PARAMETER (NRDENS=2,LRC0NT=5*NRDENS+2,LIC0NT=NRDENS+1) 
DIMENSION Y(NDGL),W0RK(LW0RK),IWORK(LIWORK) 
C0MM0N/STATD5/NFCN,NSTEP,NACCPT,NREJCT 
COMMON /C0D5R/RC0NT(LRCONT) 

COMMON /C0D5I/IC0NT(LICONT) 

EXTERNAL FAREN,SOLOUT 

C - DIMENSION OF THE SYSTEM 

N=NDGL 

C - OUTPUT ROUTINE (AND DENSE OUTPUT) IS USED DURING INTEGRATION 

I0UT=2 
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C — INITIAL VALUES AND ENDPOINT OF INTEGRATION 
X=0.0D0 
Y(l)=0.994D0 
Y(2)=0.0D0 
Y(3)=0.ODO 

Y(4)=-2.00158510637908252240537862224D0 
XEND=17.0652165601579625588917206249D0 
C — REQUIRED (RELATIVE AND ABSOLUTE) TOLERANCE 
IT0L=0 
RT0L=1.OD-7 
ATOL=RTOL 

C — DEFAULT VALUES FOR PARAMETERS 
DO 10 1=1,10 
IW0RK(I)=0 
10 WORK(I)=0.DO 

C — DENSE OUTPUT IS USED FOR THE TWO POSITION COORDINATES 1 AND 2 
IWORK(5)=NRDENS 
ICONT(2)=1 
ICONT(3)=2 

C — CALL OF THE SUBROUTINE D0PRI5 
CALL D0PRI5(N,FAREN,X,Y,XEND, 

+ RTOL,ATOL,ITOL, 

+ SOLOUT,IOUT, 

+ WORK,LWORK,IWORK,LIWORK,LRCONT,LICONT,IDID) 

C — PRINT FINAL SOLUTION 

WRITE (6,99) Y(l),Y(2) 

99 FORMAT(IX, 5 X = XEND Y = \2E18.10) 

C — PRINT STATISTICS 

WRITE (6,91) RTOL,NFCN,NSTEP,NACCPT,NREJCT 

91 FORMAT( 5 tol=\D8.2, 5 fcn=\I5, 5 step=\I4, 

+ 5 accpt= 5 ,I4, 5 rejct= 5 ,I3) 

STOP 

END 

C 

SUBROUTINE SOLOUT (NR,XOLD,X,Y,N,IRTRN) 

C — PRINTS SOLUTION AT EQUIDISTANT OUTPUT-POINTS BY USING "C0NTD5" 
IMPLICIT REAL*8 (A-H,0-Z) 

DIMENSION Y(N) 

COMMON /INTERN/XOUT 
IF (NR.EQ.l) THEN 

WRITE (6,99) X,Y(1),Y(2),NR-1 
X0UT=X+2.ODO 
ELSE 

10 CONTINUE 

IF (X.GE.XOUT) THEN 

WRITE (6,99) XOUT,C0NTD5(1,XOUT),C0NTD5(2,XOUT),NR-1 
X0UT=X0UT+2.ODO 
GOTO 10 
END IF 
END IF 

99 FORMAT(IX, 5 X =\F6.2, 5 Y =\2E18.10,’ NSTEP =’,I4) 
RETURN 
END 
C 

SUBROUTINE FAREN(N,X,Y,F) 

C — ARENSTORF ORBIT 

IMPLICIT REAL*8 (A-H,0-Z) 

DIMENSION Y(N),F(N) 

AMU=0.01227747IDO 
AMUP=1.DO-AMU 
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All 


F(1)=Y(3) 

F(2)=Y(4) 

R1=(Y(1)+AMU)**2+Y(2)**2 
R1=R1*SQRT(Rl) 

R2=(Y(1)-AMUP)**2+Y(2)**2 
R2=R2*SQRT(R2) 

F (3) =Y (1) +2*Y (4) -AMUP* (Y (1) +AMU) /Rl-AMU* (Y Cl) -AMUP) /R2 

F(4)=Y(2)-2*Y(3)-AMUP*Y(2)/R1-AMU*Y(2)/R2 

RETURN 

END 


The result, obtained on an Apollo workstation, is the following: 


X 

= 

0.00 

Y 

= 

0.9940000000E+00 

0.0000000000E+00 

NSTEP 

= 

0 

X 

= 

2.00 

Y 

= 

-0.5798781411E+00 

0.6090775251E+00 

NSTEP 

= 

60 

X 

= 

4.00 

Y 

= 

-0.1983335270E+00 

0.1137638086E+01 

NSTEP 

= 

73 

X 

= 

6.00 

Y 

= 

-0.4735743943E+00 

0.2239068118E+00 

NSTEP 

= 

91 

X 

= 

8.00 

Y 

= 

-0.1174553350E+01 

-0.2759466982E+00 

NSTEP 

= 

110 

X 

= 

10.00 

Y 

= 

-0.8398073466E+00 

0.4468302268E+00 

NSTEP 

= 

122 

X 

= 

12.00 

Y 

= 

0.1314712468E-01 

-0.8385751499E+00 

NSTEP 

= 

145 

X 

= 

14.00 

Y 

= 

-0.6031129504E+00 

-0.9912598031E+00 

NSTEP 

= 

159 

X 

= 

16.00 

Y 

= 

0.2427110999E+00 

-0.3899948833E+00 

NSTEP 

= 

177 

X 

= 

XEND 

Y 

= 

0.9940021016E+00 

0.8911185978E-05 






tol=0. 

10E-06 

fcn= 1442 step= 

240 accpt= 216 rejct : 

= 22 




Subroutine DOPRI5 


Explicit Runge-Kutta code based on the method of Dormand & Prince (see Ta¬ 
ble 5.2 of Section II.5). It is provided with the step control algorithm of Section II.4 
and the dense output of Section II.6. 


SUBROUTINE D0PRI5(N,FCN,X,Y,XEND, 

+ RTOL,ATOL,ITOL, 

+ SOLOUT,IOUT, 

+ WORK,LWORK,IWORK,LIWORK,LRCONT,LICONT,IDID) 

C- 

C NUMERICAL SOLUTION OF A SYSTEM OF FIRST ORDER 
C ORDINARY DIFFERENTIAL EQUATIONS Y 5 =F(X,Y). 

C THIS IS AN EXPLICIT RUNGE-KUTTA METHOD OF ORDER (4)5 
C DUE TO DORMAND & PRINCE (WITH STEPSIZE CONTROL AND 
C DENSE OUTPUT). 

C 

C AUTHORS: E. HAIRER AND G. WANNER 

C UNIVERSITE DE GENEVE, DEPT. DE MATHEMATIQUES 

C CH-1211 GENEVE 24, SWITZERLAND 

C E-MAIL: HAIRER® UNI2A.UNIGE.CH, WANNER® UNI2A.UNIGE.CH 

C 

C THIS CODE IS DESCRIBED IN: 

C E. HAIRER, S.P. NORSETT AND G. WANNER, SOLVING ORDINARY 

C DIFFERENTIAL EQUATIONS I. NONSTIFF PROBLEMS. 2ND EDITION. 

C SPRINGER SERIES IN COMPUTATIONAL MATHEMATICS, 

C SPRINGER-VERLAG (1993) 

C 
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C VERSION OF OCTOBER 3, 1991 

C 

C INPUT PARAMETERS 


C - 

C N DIMENSION OF THE SYSTEM 

C 

C FCN NAME (EXTERNAL) OF SUBROUTINE COMPUTING THE 

C VALUE OF F(X,Y): 

C SUBROUTINE FCN(N,X,Y,F) 

C REAL*8 X,Y(N),F(N) 

C F(l)=... ETC. 

C 

C X INITIAL X-VALUE 

C 

C Y(N) INITIAL VALUES FOR Y 

C 

C XEND FINAL X-VALUE (XEND-X MAY BE POSITIVE OR NEGATIVE) 

C 

C RTOL,ATOL RELATIVE AND ABSOLUTE ERROR TOLERANCES. THEY 

C CAN BE BOTH SCALARS OR ELSE BOTH VECTORS OF LENGTH N. 

C 

C ITOL SWITCH FOR RTOL AND ATOL: 

C IT0L=0: BOTH RTOL AND ATOL ARE SCALARS. 

C THE CODE KEEPS, ROUGHLY, THE LOCAL ERROR OF 

C Y(I) BELOW RTOL*ABS(Y(I))+ATOL 

C ITOL=l: BOTH RTOL AND ATOL ARE VECTORS. 

C THE CODE KEEPS THE LOCAL ERROR OF Y(I) BELOW 

C RTOL(I)*ABS(Y(I))+ATOL(I). 

C 

C SOLOUT NAME (EXTERNAL) OF SUBROUTINE PROVIDING THE 

C NUMERICAL SOLUTION DURING INTEGRATION. 

C IF IOUT.GE.1, IT IS CALLED AFTER EVERY SUCCESSFUL STEP. 

C SUPPLY A DUMMY SUBROUTINE IF I0UT=0. 

C IT MUST HAVE THE FORM 

C SUBROUTINE SOLOUT (NR,XOLD,X,Y,N,IRTRN) 

C REAL*8 X,Y(N) 

C .... 

C SOLOUT FURNISHES THE SOLUTION "Y" AT THE NR-TH 

C GRID-POINT "X" (THEREBY THE INITIAL VALUE IS 

C THE FIRST GRID-POINT). 

C "XOLD" IS THE PRECEEDING GRID-POINT. 

C "IRTRN" SERVES TO INTERRUPT THE INTEGRATION. IF IRTRN 

C IS SET <0, D0PRI5 WILL RETURN TO THE CALLING PROGRAM. 

C 

C CONTINUOUS OUTPUT: - 

C DURING CALLS TO "SOLOUT", A CONTINUOUS SOLUTION 

C FOR THE INTERVAL [XOLD,X] IS AVAILABLE THROUGH 

C THE FUNCTION 

C »> C0NTD5 (I, S) «< 

C WHICH PROVIDES AN APPROXIMATION TO THE I-TH 

C COMPONENT OF THE SOLUTION AT THE POINT S. THE VALUE 

C S SHOULD LIE IN THE INTERVAL [XOLD,X]. 

C 

C IOUT SWITCH FOR CALLING THE SUBROUTINE SOLOUT: 

C I0UT=0: SUBROUTINE IS NEVER CALLED 

C I0UT=1: SUBROUTINE IS USED FOR OUTPUT. 

C I0UT=2: DENSE OUTPUT IS PERFORMED IN SOLOUT 

C (IN THIS CASE W0RK(5) MUST BE SPECIFIED) 

C 

C WORK ARRAY OF WORKING SPACE OF LENGTH "LWORK". 
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C "LWORK" MUST BE AT LEAST 8*N+10 

C 

C LWORK DECLARED LENGHT OF ARRAY "WORK". 

C 

C IWORK 

C 
C 
C 
C 
C 

C LIWORK DECLARED LENGHT OF ARRAY "IWORK". 

C 

C LRCONT 

C 
C 
C 
C 
C 
C 

C LICONT 

C 
C 
C 
C 
C 
C 
C 
C 

C- 

C 

C SOPHISTICATED SETTING OF PARAMETERS 

C - 

C SEVERAL PARAMETERS (WORK(l),...,IWORK(1),...) ALLOW 

C TO ADAPT THE CODE TO THE PROBLEM AND TO THE NEEDS OF 

C THE USER. FOR ZERO INPUT, THE CODE CHOOSES DEFAULT VALUES. 

C 

C WORK(1) UROUND, THE ROUNDING UNIT, DEFAULT 2.3D-16. 

C 

C WORK(2) THE SAFETY FACTOR IN STEP SIZE PREDICTION, 

C DEFAULT 0.9D0. 

C 

C WORK(3), WORK(4) PARAMETERS FOR STEP SIZE SELECTION 
C THE NEW STEP SIZE IS CHOSEN SUBJECT TO THE RESTRICTION 

C WORK(3) <= HNEW/HOLD <= WORK(4) 

C DEFAULT VALUES: W0RK(3)=0.2D0, WORK(4)=10.DO 

C 

C WORK(5) IS THE "BETA" FOR STABILIZED STEP SIZE CONTROL 
C (SEE SECTION IV.2). LARGER VALUES OF BETA ( <= 0.1 ) 

C MAKE THE STEP SIZE CONTROL MORE STABLE. D0PRI5 NEEDS 

C A LARGER BETA THAN HIGHAM & HALL. NEGATIVE WORK(5) 

C PROVOKE BETA=0. 

C DEFAULT 0.04D0. 

C 

C WORK(6) MAXIMAL STEP SIZE, DEFAULT XEND-X. 

C 

C WORK(7) INITIAL STEP SIZE, FOR WORK(7)=0.DO AN INITIAL GUESS 
C IS COMPUTED WITH HELP OF THE FUNCTION HINIT 

C 

C IWORK(1) THIS IS THE MAXIMAL NUMBER OF ALLOWED STEPS. 

C THE DEFAULT VALUE (FOR IWORK(1)=0) IS 100000. 


DECLARED LENGTH OF COMMON BLOCK 
»> COMMON /C0D5R/RC0NT (LRCONT) «< 

WHICH MUST BE DECLARED IN THE CALLING PROGRAM. 

"LRCONT" MUST BE AT LEAST 
5 * NRDENS + 2 

WHERE NRDENS=IWORK(5) (SEE BELOW). 

DECLARED LENGTH OF COMMON BLOCK 
»> COMMON /C0D5I/IC0NT(LICONT) «< 

WHICH MUST BE DECLARED IN THE CALLING PROGRAM. 

"LICONT" MUST BE AT LEAST 
NRDENS + 1 

THESE COMMON BLOCKS ARE USED FOR STORING THE COEFFICIENTS 
OF THE CONTINUOUS SOLUTION AND MAKES THE CALLING LIST FOR 
THE FUNCTION "C0NTD5" AS SIMPLE AS POSSIBLE. 


INTEGER WORKING SPACE OF LENGHT "LIWORK". 
IWORK(1),...,IWORK(5) SERVE AS PARAMETERS 
FOR THE CODE. FOR STANDARD USE, SET THEM 
TO ZERO BEFORE CALLING. 

"LIWORK" MUST BE AT LEAST 10 . 
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C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


c 

c 

c 

c 

c 

c 


IW0RK(2) SWITCH FOR THE CHOICE OF THE COEFFICIENTS 

IF IW0RK(2).EQ.l METHOD D0PRI5 OF DORMAND AND PRINCE 
(TABLE 5.2 OF SECTION II.5). 

AT THE MOMENT THIS IS THE ONLY POSSIBLE CHOICE. 

THE DEFAULT VALUE (FOR IW0RK(2)=0) IS IW0RK(2)=1. 

IW0RK(3) SWITCH FOR PRINTING ERROR MESSAGES 

IF IW0RK(3).LT.O NO MESSAGES ARE BEING PRINTED 
IF IW0RK(3).GT.O MESSAGES ARE PRINTED WITH 
WRITE (IW0RK(3),*) ... 

DEFAULT VALUE (FOR IW0RK(3)=0) IS IW0RK(3)=6 

IW0RK(4) TEST FOR STIFFNESS IS ACTIVATED AFTER STEP NUMBER 
J*IW0RK(4) (J INTEGER), PROVIDED IW0RK(4).GT.O. 

FOR NEGATIVE IW0RK(4) THE STIFFNESS TEST IS 
NEVER ACTIVATED; DEFAULT VALUE IS IW0RK(4)=1000 

IW0RK(5) = NRDENS = NUMBER OF COMPONENTS, FOR WHICH DENSE OUTPUT 
IS REQUIRED; DEFAULT VALUE IS IWORK(5)=0; 

FOR 0 < NRDENS < N THE COMPONENTS (FOR WHICH DENSE 
OUTPUT IS REQUIRED) HAVE TO BE SPECIFIED IN 
ICONT(2),...,ICONT(NRDENS+1); 

FOR NRDENS=N THIS IS DONE BY THE CODE. 


OUTPUT PARAMETERS 


X X-VALUE FOR WHICH THE SOLUTION HAS BEEN COMPUTED 

(AFTER SUCCESSFUL RETURN X=XEND). 

Y(N) NUMERICAL SOLUTION AT X 

H PREDICTED STEP SIZE OF THE LAST ACCEPTED STEP 

IDID REPORTS ON SUCCESSFULNESS UPON RETURN: 

IDID= 1 COMPUTATION SUCCESSFUL, 

IDID= 2 COMPUT. SUCCESSFUL (INTERRUPTED BY SOLOUT) 
IDID=-1 INPUT IS NOT CONSISTENT, 

IDID=-2 LARGER NMAX IS NEEDED, 

IDID=-3 STEP SIZE BECOMES TOO SMALL. 

IDID=-4 PROBLEM IS PROBABLY STIFF (INTERRUPTED). 


*** *** *** *** *** *** *** *** *** *** *** *** *** 

DECLARATIONS 

$$$ :|e^ci|e $$$ >|o|o|< He** s|es|es|e s|es|e5|c s|cs|es|e sjci|c^ei|e 

IMPLICIT REAL*8 (A-H,0-Z) 

DIMENSION Y(N),AT0L(1),RT0L(1),WORK(LWORK),IWORK(LIWORK) 
LOGICAL ARRET 
EXTERNAL FCN,SOLOUT 

C0MM0N/STATD5/NFCN,NSTEP,NACCPT,NREJCT 

- COMMON STATD5 CAN BE INSPECTED FOR STATISTICAL PURPOSES: 

NFCN NUMBER OF FUNCTION EVALUATIONS 

NSTEP NUMBER OF COMPUTED STEPS 

NACCPT NUMBER OF ACCEPTED STEPS 

NREJCT NUMBER OF REJECTED STEPS (AFTER AT LEAST ONE STEP 

HAS BEEN ACCEPTED) 
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Subroutine DOP853 

Explicit Runge-Kutta code of order 8 based on the method of Dormand & Prince, 
described in Section II.5. The local error estimation and the step size control is 
based on embedded formulas or orders 5 and 3 (see Section II. 10). This method 
is provided with a dense output of order 7. In the following description we have 
omitted the parts which are identical to those for DOPRI5. 


SUBROUTINE D0P853(N,FCN,X,Y,XEND, 

+ RTOL,ATOL,ITOL, 

+ S0L0UT,I0UT, 

+ WORK,LWORK,IWORK,LIWORK,LRCONT,LICONT,IDID) 

C- 

C NUMERICAL SOLUTION OF A SYSTEM OF FIRST ORDER 
C ORDINARY DIFFERENTIAL EQUATIONS Y 5 =F(X,Y). 

C THIS IS AN EXPLICIT RUNGE-KUTTA METHOD OF ORDER 8(5,3) 

C DUE TO DORMAND & PRINCE (WITH STEPSIZE CONTROL AND 
C DENSE OUTPUT) 


C 

C VERSION OF NOVEMBER 29, 1992 


C - CONTINUOUS OUTPUT: - 

C DURING CALLS TO "SOLOUT", A CONTINUOUS SOLUTION 

C FOR THE INTERVAL [X0LD,X] IS AVAILABLE THROUGH 

C THE FUNCTION 

C »> C0NTD8 (I, S) «< 

C WHICH PROVIDES AN APPROXIMATION TO THE I-TH 


C 

C WORK ARRAY OF WORKING SPACE OF LENGTH "LWORK". 

C "LWORK" MUST BE AT LEAST 11*N+10 


C 

C LRCONT DECLARED LENGTH OF COMMON BLOCK 

C »> COMMON /C0D8R/RC0NT (LRCONT) «< 

C WHICH MUST BE DECLARED IN THE CALLING PROGRAM. 

C "LRCONT" MUST BE AT LEAST 

C 8 * NRDENS + 2 

C WHERE NRDENS=IWORK(5) (SEE BELOW). 

C 

C LICONT DECLARED LENGTH OF COMMON BLOCK 

C »> COMMON /C0D8I/IC0NT(LICONT) «< 

C WHICH MUST BE DECLARED IN THE CALLING PROGRAM. 

C "LICONT" MUST BE AT LEAST 

C NRDENS + 1 

C THESE COMMON BLOCKS ARE USED FOR STORING THE COEFFICIENTS 

C OF THE CONTINUOUS SOLUTION AND MAKES THE CALLING LIST FOR 

C THE FUNCTION "C0NTD8" AS SIMPLE AS POSSIBLE. 


C 

C WORK(3), WORK(4) PARAMETERS FOR STEP SIZE SELECTION 
C THE NEW STEP SIZE IS CHOSEN SUBJECT TO THE RESTRICTION 

C WORK(3) <= HNEW/HOLD <= WORK(4) 

C DEFAULT VALUES: W0RK(3)=0.333D0, WORK(4)=6.DO 
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Subroutine ODEX 

Extrapolation code for y' = f(x,y ), based on the GBS algorithm (Section II.9). It 
uses variable order and variable step sizes and is provided with a high-order dense 
output. Again, the missing parts in the description are identical to those of DOPRI5. 


SUBROUTINE ODEX(N,FCN,X,Y,XEND,H, 

+ RTOL,ATOL,ITOL, 

+ SOLOUT,IOUT, 

+ WORK,LWORK,IWORK,LIWORK,LRCONT,LICONT,IDID) 

C- 

C NUMERICAL SOLUTION OF A SYSTEM OF FIRST ORDER 
C ORDINARY DIFFERENTIAL EQUATIONS Y 5 =F(X,Y). 

C THIS IS AN EXTRAPOLATION-ALGORITHM (GBS), BASED ON THE 
C EXPLICIT MIDPOINT RULE (WITH STEPSIZE CONTROL, 

C ORDER SELECTION AND DENSE OUTPUT). 

C 

C AUTHORS: E. HAIRER AND G. WANNER 

C UNIVERSITE DE GENEVE, DEPT. DE MATHEMATIQUES 

C CH-1211 GENEVE 24, SWITZERLAND 

C E-MAIL: HAIRER® UNI2A.UNIGE.CH, WANNER® UNI2A.UNIGE.CH 

C DENSE OUTPUT WRITTEN BY E. HAIRER AND A. OSTERMANN 


C 

C VERSION DECEMBER 18, 1991 


C 

C H INITIAL STEP SIZE GUESS; 

C H=1.DO/(NORM OF F 5 ), USUALLY l.D-1 OR l.D-3, IS GOOD. 

C THIS CHOICE IS NOT VERY IMPORTANT, THE CODE QUICKLY 

C ADAPTS ITS STEP SIZE. WHEN YOU ARE NOT SURE, THEN 

C STUDY THE CHOSEN VALUES FOR A FEW 

C STEPS IN SUBROUTINE "SOLOUT". 

C (IF H=0.DO, THE CODE PUTS H=l.D-4). 


C 

C - CONTINUOUS OUTPUT (IF I0UT=2): - 

C DURING CALLS TO "SOLOUT", A CONTINUOUS SOLUTION 

C FOR THE INTERVAL [XOLD,X] IS AVAILABLE THROUGH 

C THE REAL*8 FUNCTION 

C »> C0NTEX(I ,S) «< 

C WHICH PROVIDES AN APPROXIMATION TO THE I-TH 

C COMPONENT OF THE SOLUTION AT THE POINT S. THE VALUE 

C S SHOULD LIE IN THE INTERVAL [XOLD,X]. 


C 

C WORK ARRAY OF WORKING SPACE OF LENGTH "LWORK". 

C SERVES AS WORKING SPACE FOR ALL VECTORS. 

C "LWORK" MUST BE AT LEAST 

C N*(KM+5)+5*KM+10+2*KM*(KM+1)*NRDENS 

C WHERE NRDENS=IWORK(8) (SEE BELOW) AND 

C KM=9 IF IWORK (2) =0 

C KM=IWORK(2) IF IWORK(2).GT.0 

C WORK(l),...,W0RK(10) SERVE AS PARAMETERS 

C FOR THE CODE. FOR STANDARD USE, SET THESE 

C PARAMETERS TO ZERO BEFORE CALLING. 
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C 

C IWORK 

c 

c 

c 

C 

C 


INTEGER WORKING SPACE OF LENGTH "LIWORK". 
"LIWORK" MUST BE AT LEAST 

2*KM+10+NRDENS 

IWORK(1),...,IWORK(9) SERVE AS PARAMETERS 
FOR THE CODE. FOR STANDARD USE, SET THESE 
PARAMETERS TO ZERO BEFORE CALLING. 


C 

C LRCONT 

C 

C 

C 

C 

C 

C 

C LICONT 

C 

C 

C 

C 

C 

C 

C 


DECLARED LENGTH OF COMMON BLOCK 
»> COMMON /CONTR/RCONT (LRCONT) «< 

WHICH MUST BE DECLARED IN THE CALLING PROGRAM. 

"LRCONT" MUST BE AT LEAST 

( 2 * KM + 5 ) * NRDENS + 2 
WHERE KM=IWORK(2) AND NRDENS=IW0RK(8) (SEE BELOW). 

DECLARED LENGTH OF COMMON BLOCK 
»> COMMON /CONTI/ICONT(LICONT) «< 

WHICH MUST BE DECLARED IN THE CALLING PROGRAM. 

"LICONT" MUST BE AT LEAST 
NRDENS + 2 

THESE COMMON BLOCKS ARE USED FOR STORING THE COEFFICIENTS 
OF THE CONTINUOUS SOLUTION AND MAKES THE CALLING LIST FOR 
THE FUNCTION "CONTEX" AS SIMPLE AS POSSIBLE. 


C 

C WORK(2) 

C 

C WORK(3) 

C 

C 

C WORK(4), 

C 

C 

C 

C 

C 

C 

C WORK(6), 

C 

C 

C 

C 

C WORK(8), 

C 

C 

C 

C 


MAXIMAL STEP SIZE, DEFAULT XEND-X. 

STEP SIZE IS REDUCED BY FACTOR WORK(3), IF THE 
STABILITY CHECK IS NEGATIVE, DEFAULT 0.5. 

WORK(5) PARAMETERS FOR STEP SIZE SELECTION 
THE NEW STEP SIZE FOR THE J-TH DIAGONAL ENTRY IS 
CHOSEN SUBJECT TO THE RESTRICTION 

FACMIN/WORK(5) <= HNEW(J)/HOLD <= 1/FACMIN 
WHERE FACMIN=WORK(4)**(1/(2*J-l)) 

DEFAULT VALUES: WORK(4)=0.02D0, WORK(5)=4.DO 

WORK(7) PARAMETERS FOR THE ORDER SELECTION 
STEP SIZE IS DECREASED IF W(K-l) <= W(K)*W0RK(6) 

STEP SIZE IS INCREASED IF W(K) <= W(K-l)*WORK(7) 

DEFAULT VALUES: WORK(6)=0.8D0, WORK(7)=0.9D0 

WORK(9) SAFETY FACTORS FOR STEP CONTROL ALGORITHM 
HNEW=H*WORK(9)*(WORK(8)*TOL/ERR)**(1/(J-l)) 

DEFAULT VALUES: W0RK(8)=0.65D0, 

WORK(9)=0.94D0 IF "HOPE FOR CONVERGENCE" 
WORK(9)=0.90D0 IF "NO HOPE FOR CONVERGENCE" 


C 

C IWORK(2) 

C 
C 
C 

C IWORK(3) 

C 

C 

C 

C 

C 


THE MAXIMUM NUMBER OF COLUMNS IN THE EXTRAPOLATION 
TABLE. THE DEFAULT VALUE (FOR IWORK(2)=0) IS 9. 

IF IWORK(2).NE.O THEN IWORK(2) SHOULD BE .GE.3. 

SWITCH FOR THE STEP SIZE SEQUENCE (EVEN NUMBERS ONLY) 
IF IWORK(3).EQ.1 THEN 2,4,6,8,10,12,14,16,... 

IF IWORK(3).EQ.2 THEN 2,4,8,12,16,20,24,28,... 

IF IWORK(3).EQ.3 THEN 2,4,6,8,12,16,24,32,... 

IF IWORK(3).EQ.4 THEN 2,6,10,14,18,22,26,30,... 

IF IWORK(3).EQ.5 THEN 4,8,12,16,20,24,28,32,... 
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C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


THE DEFAULT VALUE IS IW0RK(3)=1 IF I0UT.LE.1; 

THE DEFAULT VALUE IS IW0RK(3)=4 IF I0UT.GE.2. 

IW0RK(4) STABILITY CHECK IS ACTIVATED AT MOST IW0RK(4) TIMES IN 
ONE LINE OF THE EXTRAP. TABLE, DEFAULT IW0RK(4)=1. 

IW0RK(5) STABILITY CHECK IS ACTIVATED ONLY IN THE LINES 

1 TO IW0RKC5) OF THE EXTRAP. TABLE, DEFAULT IW0RK(5)=1. 

IW0RK(6) IF IW0RK(6)=0 ERROR ESTIMATOR IN THE DENSE 

OUTPUT FORMULA IS ACTIVATED. IT CAN BE SUPPRESSED 
BY PUTTING IW0RK(6)=1. 

DEFAULT IW0RK(6)=0 (IF I0UT.GE.2). 

IW0RK(7) DETERMINES THE DEGREE OF INTERPOLATION FORMULA 
MU = 2 * KAPPA - IW0RK(7) + 1 
IW0RKC7) SHOULD LIE BETWEEN 1 AND 6 
DEFAULT IW0RK(7)=4 (IF IW0RK(7)=0). 

IW0RK(8) = NRDENS = NUMBER OF COMPONENTS, FOR WHICH DENSE OUTPUT 
IS REQUIRED 

IWORK(IO),...,IW0RK(NRDENS+9) INDICATE THE COMPONENTS, FOR WHICH 
DENSE OUTPUT IS REQUIRED 


C 

C IDID REPORTS ON SUCCESSFULNESS UPON RETURN: 

C IDID=1 COMPUTATION SUCCESSFUL, 

C IDID=-1 COMPUTATION UNSUCCESSFUL. 


Subroutine ODEX2 

Extrapolation code for second order differential equations y" = f(x,y) (Sec¬ 
tion 11.14). It uses variable order and variable step sizes and is provided with a 
high-order dense output. The missing parts of the description are identical to those 
of ODEX. 


SUBROUTINE 0DEX2(N,FCN,X,Y,YP,XEND,H, 

+ RTOL,ATOL,ITOL, 

+ SOLOUT,IOUT, 

+ WORK,LWORK,IWORK,LIWORK,LRCONT,LICONT,IDID) 

C- 

C NUMERICAL SOLUTION OF A SYSTEM OF SECOND ORDER 
C ORDINARY DIFFERENTIAL EQUATIONS Y 55 =F(X,Y). 

C THIS IS AN EXTRAPOLATION-ALGORITHM, BASED ON 
C THE STOERMER RULE (WITH STEPSIZE CONTROL 
C ORDER SELECTION AND DENSE OUTPUT). 


C 

C VERSION MARCH 30, 1992 


C 

C Y(N) INITIAL VALUES FOR Y 

C 
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C YP(N) INITIAL VALUES FOR Y 5 


C 

C ITOL SWITCH FOR RTOL AND ATOL: 

C IT0L=0: BOTH RTOL AND ATOL ARE SCALARS. 

C THE CODE KEEPS, ROUGHLY, THE LOCAL ERROR OF 

C Y(I) BELOW RTOL*ABS(Y(I))+ATOL 

C YP(I) BELOW RTOL*ABS(YP(I))+ATOL 

C ITOL=l: BOTH RTOL AND ATOL ARE VECTORS. 

C THE CODE KEEPS THE LOCAL ERROR OF 

C Y(I) BELOW RTOL(I)*ABS(Y(I))+ATOL(I). 

C YP(I) BELOW RTOL(I+N)*ABS(YP(I))+ATOL(I+N). 

C 

C SOLOUT NAME (EXTERNAL) OF SUBROUTINE PROVIDING THE 

C NUMERICAL SOLUTION DURING INTEGRATION. 

C IF I0UT>=1, IT IS CALLED AFTER EVERY SUCCESSFUL STEP. 

C SUPPLY A DUMMY SUBROUTINE IF I0UT=0. 

C IT MUST HAVE THE FORM 

C SUBROUTINE SOLOUT (NR,XOLD,X,Y,YP,N,IRTRN) 

C REAL*8 X,Y(N),YP(N) 

C .... 

C SOLOUT FURNISHES THE SOLUTIONS "Y, YP" AT THE NR-TH 

C GRID-POINT "X" (THEREBY THE INITIAL VALUE IS 

C THE FIRST GRID-POINT). 

C "XOLD" IS THE PRECEEDING GRID-POINT. 

C "IRTRN" SERVES TO INTERRUPT THE INTEGRATION. IF IRTRN 

C IS SET <0, 0DEX2 WILL RETURN TO THE CALLING PROGRAM. 

C 

C CONTINUOUS OUTPUT (IF I0UT=2): - 

C DURING CALLS TO "SOLOUT", A CONTINUOUS SOLUTION 

C FOR THE INTERVAL [XOLD,X] IS AVAILABLE THROUGH 

C THE REAL*8 FUNCTION 

C »> C0NTX2 (I, S) «< 

C WHICH PROVIDES AN APPROXIMATION TO THE I-TH 

C COMPONENT OF THE SOLUTION AT THE POINT S. THE VALUE 

C S SHOULD LIE IN THE INTERVAL [XOLD,X]. 


C 

C WORK ARRAY OF WORKING SPACE OF LENGTH "LWORK". 

C SERVES AS WORKING SPACE FOR ALL VECTORS. 

C "LWORK" MUST BE AT LEAST 

C N*(2*KM+6)+5*KM+10+KM*(2*KM+3)*NRDENS 

C WHERE NRDENS=IWORK(8) (SEE BELOW) AND 

C KM=9 IF IW0RK(2)=0 

C KM=IW0RK(2) IF IW0RK(2).GT.0 

C WORK(l),...,W0RK(10) SERVE AS PARAMETERS 

C FOR THE CODE. FOR STANDARD USE, SET THESE 

C PARAMETERS TO ZERO BEFORE CALLING. 


C 

C IWORK INTEGER WORKING SPACE OF LENGTH "LIWORK". 

C "LIWORK" MUST BE AT LEAST 

C KM+9+NRDENS 

C IWORK(1),...,IWORK(9) SERVE AS PARAMETERS 

C FOR THE CODE. FOR STANDARD USE, SET THESE 

C PARAMETERS TO ZERO BEFORE CALLING. 


C 

C LRCONT DECLARED LENGTH OF COMMON BLOCK 

C »> COMMON /C0NTR2/RC0NT (LRCONT) «< 
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C 

C 

C 

C 

C 

C LICONT 

C 

C 

C 

C 

C 

C 

C 


WHICH MUST BE DECLARED IN THE CALLING PROGRAM. 

"LRCONT" MUST BE AT LEAST 

( 2 * KM + 6 ) * NRDENS + 2 
WHERE KM=IW0RK(2) AND NRDENS=IW0RK(8) (SEE BELOW). 

DECLARED LENGTH OF COMMON BLOCK 
»> COMMON /C0NTI2/IC0NT (LICONT) «< 

WHICH MUST BE DECLARED IN THE CALLING PROGRAM. 

"LICONT" MUST BE AT LEAST 
NRDENS + 2 

THESE COMMON BLOCKS ARE USED FOR STORING THE COEFFICIENTS 
OF THE CONTINUOUS SOLUTION AND MAKES THE CALLING LIST FOR 
THE FUNCTION "C0NTX2" AS SIMPLE AS POSSIBLE. 


C 

C WORK(3) STEP SIZE IS REDUCED BY FACTOR WORK(3), IF DURING THE 
C COMPUTATION OF THE EXTRAPOLATION TABLEAU DIVERGENCE 

C IS OBSERVED; DEFAULT 0.5. 


C 

C 

C 

C 

C 

C 

C 

C 


IW0RK(3) SWITCH FOR THE STEP SIZE SEQUENCE (EVEN NUMBERS ONLY) 
IF IW0RK(3).EQ.l THEN 2,4,6,8,10,12,14,16,... 

IF IW0RK(3).EQ.2 THEN 2,4,8,12,16,20,24,28,... 

IF IW0RK(3).EQ.3 THEN 2,4,6,8,12,16,24,32,... 

IF IW0RK(3).EQ.4 THEN 2,6,10,14,18,22,26,30,... 

THE DEFAULT VALUE IS IW0RK(3)=1 IF IOUT.LE.l; 

THE DEFAULT VALUE IS IW0RK(3)=4 IF I0UT.GE.2. 


C 

C IW0RK(7) 
C 
C 
C 


DETERMINES THE DEGREE OF INTERPOLATION FORMULA 
MU = 2 * KAPPA - IW0RK(7) + 1 
IWORK(7) SHOULD LIE BETWEEN 1 AND 8 
DEFAULT IWORK(7)=6 (IF IWORK(7)=0). 


Driver for the Code RETARD 


We consider the delay equation (11.17.14) with initial values and initial functions 
given there. This is a 3-dimensional problem, but only the second component 
is used with retarded argument (hence nrdens=i). We require that the points 
1,2,3,...,9,10, 20 (points of discontinuity of the derivatives of the solution) are 
hitten exactly by the integration routine. 


IMPLICIT REAL*8 (A-H,0-Z) 

PARAMETER (NDGL=3,NGRID=11,LW0RK=8*NDGL+11+NGRID,LIW0RK=10) 
PARAMETER (NRDENS=1,LRC0NT=500,LIC0NT=NRDENS+1) 

DIMENSION Y(NDGL),WORK(LWORK),IWORK(LIWORK) 
COMMON/STATRE/NFCN,NSTEP,NACCPT,NREJCT 
COMMON /CORER/RCONT(LRCONT) 

COMMON /COREI/ICONT(LICONT) 

EXTERNAL FCN,SOLOUT 
C — DIMENSION OF THE SYSTEM 
N=NDGL 

C — OUTPUT ROUTINE IS USED DURING INTEGRATION 
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I0UT=1 

C - INITIAL VALUES AND ENDPOINT OF INTEGRATION 

X=0.0D0 
Y(l)=5.0D0 
Y(2)=0.IDO 
Y(3)=1.ODO 
XEND=40.DO 

C - REQUIRED (RELATIVE AND ABSOLUTE) TOLERANCE 

IT0L=0 
RTOL=l.OD-5 
ATOL=RTOL 

C - DEFAULT VALUES FOR PARAMETERS 

DO 10 1=1,10 

IW0RK(I)=0 
10 WORK(I)=O.DO 

C - SECOND COMPONENT USES RETARDED ARGUMENT 

IW0RK(5)=NRDENS 

IC0NT(2)=2 

C - USE AS GRID-POINTS 

IW0RK(6)=NGRID 
DO 12 1=1,NGRID-1 
12 W0RK(10+I)=I 

WORK(10+NGRID)=20.DO 

C - CALL OF THE SUBROUTINE RETARD 

CALL RETARD(N,FCN,X,Y,XEND, 

+ RTOL,ATOL,ITOL, 

+ SOLOUT,IOUT, 

+ WORK,LWORK,IWORK,LIWORK,LRCONT,LICONT,IDID) 

C - PRINT FINAL SOLUTION 

WRITE (6,99) Y(l),Y(2),Y(3) 

99 FORMAT(IX, 5 X = XEND Y =\3E18.10) 

C - PRINT STATISTICS 

WRITE (6,91) RTOL,NFCN,NSTEP,NACCPT,NREJCT 
91 FORMAT( 5 tol=\D8.2, 5 fcn=\I5, 5 step=\I4, 

+ 5 accpt^,^,’ rejct= 5 ,I3) 

STOP 

END 

C 

C 

SUBROUTINE SOLOUT (NR,XOLD,X,Y,N,IRTRN) 

C - PRINTS SOLUTION AT EQUIDISTANT OUTPUT-POINTS 

IMPLICIT REAL*8 (A-H,0-Z) 

DIMENSION Y(N) 

EXTERNAL PHI 
COMMON /INTERN/XOUT 
IF (NR.EQ.l) THEN 

WRITE (6,99) X,Y(1),NR-1 
X0UT=X+5.DO 
ELSE 

10 CONTINUE 

IF (X.GE.XOUT) THEN 

WRITE (6,99) X,Y(1),NR-1 
X0UT=X0UT+5.DO 
GOTO 10 
END IF 
END IF 

99 FORMAT(IX, 5 X =’,F6.2,’ Y =\E18.10, 5 NSTEP =\I4) 
RETURN 
END 
C 
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SUBROUTINE FCN(N,X,Y,F) 
IMPLICIT REAL*8 (A-H,0-Z) 
DIMENSION Y(N),F(N) 
EXTERNAL PHI 
Y2L1=YLAG(2,X-l.DO,PHI) 
Y2L10=YLAG(2,X-10.DO,PHI) 
F(1)=-Y(1)*Y2L1+Y2L10 
F(2)=Y(1)*Y2L1-Y(2) 

F(3)=Y(2)-Y2L10 

RETURN 

END 

FUNCTION PHI(I,X) 

IMPLICIT REAL*8 (A-H,0-Z) 
IF (I.EQ.2) PHI=0.1D0 
RETURN 
END 


The result, obtained on an Apollo workstation, is the following: 


X 

= 

0.00 

Y = 

X 

= 

5.00 

Y = 

X 

= 

10.00 

Y = 

X 

= 

15.29 

Y = 

X 

= 

20.00 

Y = 

X 

= 

25.22 

Y = 

X 

= 

30.48 

Y = 

X 

= 

35.10 

Y = 

X 

= 

40.00 

Y = 

X 

= 

XEND 

Y = 



tol=0. 

10E-04 


0.5000000000E+01 
0.2533855892E+00 
0.3328560326E+00 
0.4539376456E+01 
0.1706635702E+00 
0.2524799457E+00 
0.5134266860E+01 
0.3610797907E+00 
0.9125544555E-01 
0.9125544555E-01 
fcn= 586 step= 


NSTEP = 0 

NSTEP = 18 

NSTEP = 32 

NSTEP = 40 

NSTEP = 52 

NSTEP = 62 

NSTEP = 68 

NSTEP = 78 

NSTEP = 89 

0.2029882456E-01 


0.5988445730E+01 


97 accpt= 89 rejct= 8 


Subroutine RETARD 


Modification of the code DOPRI5 for delay differential equations (see Sec¬ 
tion 11.17). The missing parts of the description are identical to those of DOPRI5. 


SUBROUTINE RETARD(N,FCN,X,Y,XEND, 

+ RTOL,ATOL,ITOL, 

+ S0L0UT,I0UT, 

+ WORK,LWORK,IWORK,LIWORK,LRCONT,LICONT,IDID) 

C- 

C NUMERICAL SOLUTION OF A SYSTEM OF FIRST ORDER DELAY 
C ORDINARY DIFFERENTIAL EQUATIONS Y 5 (X)=F(X,Y(X),Y(X-A) 

C THIS CODE IS BASED ON AN EXPLICIT RUNGE-KUTTA METHOD OF 
C ORDER (4)5 DUE TO DORMAND & PRINCE (WITH STEPSIZE CONTROL 
C AND DENSE OUTPUT). 


C 

C VERSION OF APRIL 24, 1992 


C 

C FCN NAME (EXTERNAL) OF SUBROUTINE COMPUTING THE RIGHT- 

C HAND-SIDE OF THE DELAY EQUATION, E.G., 
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C SUBROUTINE FCN(N,X,Y,F) 

C REAL*8 X,Y(N),F(N) 

C EXTERNAL PHI 

C F(1)=(1.4D0-YLAG(1,X-1.DO,PHI))*Y(1) 

C F(2)=... ETC. 

C FOR AN EXPLICATION OF YLAG SEE BELOW. 

C DO NOT USE YLAG(I,X-0.DO,PHI) ! 

C THE INITIAL FUNCTION HAS TO BE SUPPLIED BY: 

C FUNCTION PHI(I,X) 

C REAL*8 PHI,X 

C WHERE I IS THE COMPONENT AND X THE ARGUMENT 


C 

C Y(N) INITIAL VALUES FOR Y (MAY BE DIFFERENT FROM PHI (I,X), 

C IN THIS CASE IT IS HIGHLY RECOMMENDED TO SET IW0RK(6) 

C AND WORK(11),..., SEE BELOW) 


C 

C - CONTINUOUS OUTPUT: - 

C DURING CALLS TO "SOLOUT" AS WELL AS TO "FCN", A 

C CONTINUOUS SOLUTION IS AVAILABLE THROUGH THE FUNCTION 

C »> YLAG(I ,S,PHI) «< 

C WHICH PROVIDES AN APPROXIMATION TO THE I-TH 

C COMPONENT OF THE SOLUTION AT THE POINT S. THE VALUE S 

C HAS TO LIE IN AN INTERVAL WHERE THE NUMERICAL SOLUTION 

C IS ALREADY COMPUTED. IT DEPENDS ON THE SIZE OF LRCONT 

C (SEE BELOW) HOW FAR BACK THE SOLUTION IS AVAILABLE. 

C 

C IOUT SWITCH FOR CALLING THE SUBROUTINE SOLOUT: 

C I0UT=0: SUBROUTINE IS NEVER CALLED 

C IOUT=l: SUBROUTINE IS USED FOR OUTPUT. 

C 

C WORK ARRAY OF WORKING SPACE OF LENGTH "LWORK". 

C "LWORK" MUST BE AT LEAST 8*N+11+NGRID 

C WHERE NGRID=IWORK(6) 


C 

C LRCONT DECLARED LENGTH OF COMMON BLOCK 

C »> COMMON /CORER/RCONT (LRCONT) «< 

C WHICH MUST BE DECLARED IN THE CALLING PROGRAM. 

C "LRCONT" MUST BE SUFFICIENTLY LARGE. IF THE DENSE 

C OUTPUT OF MXST BACK STEPS HAS TO BE STORED, IT MUST 

C BE AT LEAST 

C MXST * ( 5 * NRDENS + 2 ) 

C WHERE NRDENS=IWORK(5) (SEE BELOW). 

C 

C LICONT DECLARED LENGTH OF COMMON BLOCK 

C »> COMMON /COREI/ICONT(LICONT) «< 

C WHICH MUST BE DECLARED IN THE CALLING PROGRAM. 

C "LICONT" MUST BE AT LEAST 

C NRDENS + 1 

C THESE COMMON BLOCKS ARE USED FOR STORING THE COEFFICIENTS 

C OF THE CONTINUOUS SOLUTION AND MAKES THE CALLING LIST FOR 

C THE FUNCTION "C0NTD5" AS SIMPLE AS POSSIBLE. 


C 

C WORK(11),...,WORK(10+NGRID) PRESCRIBED POINTS, WHICH THE 
C INTEGRATION METHOD HAS TO TAKE AS GRID-POINTS 

C X < WORK(11) < W0RK(12) < ... < WORK(10+NGRID) <= XEND 
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C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 


IW0RK(5) = NRDENS = NUMBER OF COMPONENTS, FOR WHICH DENSE OUTPUT 
IS REQUIRED (EITHER BY "SOLOUT" OR BY "FCN"); 

DEFAULT VALUE (FOR IW0RK(5)=0) IS IW0RK(5)=N; 

FOR 0 < NRDENS < N THE COMPONENTS (FOR WHICH DENSE 
OUTPUT IS REQUIRED) HAVE TO BE SPECIFIED IN 
ICONT(2),...,ICONT(NRDENS+1); 

FOR NRDENS=N THIS IS DONE BY THE CODE. 

IW0RK(6) = NGRID = NUMBER OF PRESCRIBED POINTS IN THE 

INTEGRATION INTERVAL WHICH HAVE TO BE GRID-POINTS 
IN THE INTEGRATION. USUALLY, AT THESE POINTS THE 
SOLUTION OR ONE OF ITS DERIVATIVE HAS A DISCONTINUITY. 
DEFINE THESE POINTS IN WORK(11),...,WORK(10+NGRID) 
DEFAULT VALUE: IW0RK(6)=0 


C 

C IDID 

C 

C 

C 

C 

C 

C 

C 


REPORTS ON SUCCESSFULNESS UPON RETURN: 

IDID= 1 COMPUTATION SUCCESSFUL, 

IDID= 2 COMPUT. SUCCESSFUL (INTERRUPTED BY SOLOUT) 
IDID=-1 INPUT IS NOT CONSISTENT, 

IDID=-2 LARGER NMAX IS NEEDED, 

IDID=-3 STEP SIZE BECOMES TOO SMALL. 

IDID=-4 PROBLEM IS PROBABLY STIFF (INTERRUPTED). 
IDID=-5 COMPUT. INTERRUPTED BY YLAG 
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To Evi and Myriarn 



From the Preface to the First Edition 


“Whatever regrets may be, we have done our best.” 

(Sir Ernest Shack- 

leton, turning back on 9 January 1909 at 88° 23 ’ South.) 


Brahms struggled for 20 years to write his first symphony. Compared to this, the 
10 years we have been working on these two volumes may even appear short. 

This second volume treats stiff differential equations and differential algebraic 
equations. It contains three chapters: Chapter IV on one-step (Runge-Kutta) meth¬ 
ods for stiff problems, Chapter V on multistep methods for stiff problems, and 
Chapter VI on singular perturbation and differential-algebraic equations. 

Each chapter is divided into sections. Usually the first sections of a chapter 
are of an introductory nature, explain numerical phenomena and exhibit numerical 
results. Investigations of a more theoretical nature are presented in the later sections 
of each chapter. 

As in Volume I, the formulas, theorems, tables and figures are numbered con¬ 
secutively in each section and indicate, in addition, the section number. In cross 
references to other chapters the (latin) chapter number is put first. References to the 
bibliography are again by “author” plus “year” in parentheses. The bibliography 
again contains only those papers which are discussed in the text and is in no way 
meant to be complete. 

It is a pleasure to thank J. Butcher, G. Dahlquist, and S.R Nprsett (coauthor of 
Volume I) for their interest in the subject and for the numerous discussions we had 
with them which greatly inspired our work. Special thanks go to the participants 
of our seminar in Geneva, in particular Ch. Lubich, A. Ostermann and M. Roche, 
where all the subjects of this book have been presented and discussed over the 
years. Much help in preparing the manuscript was given by J. Steinig, Ch. Lubich 
and A. Ostermann who read and re-read the whole text and made innumerable 
corrections and suggestions for improvement. We express our sincere gratitude to 
them. Many people have seen particular sections and made invaluable suggestions 
and remarks: M. Crouzeix, R Deuflhard, K. Gustafsson, G. Hall, W. Hundsdorfer, 
L. Jay, R. Jeltsch, J.R Kauthen, H. Kraaijevanger, R. Marz, and O. Nevanlinna. ... 
Several pictures were produced by our children Klaudia Wanner and Martin Hairer, 
the one by drawing the other by hacking. 

The marvellous, perfect and never failing TEX program of D. Knuth allowed 
us to deliver a camera-ready manuscript to Springer Verlag, so that the book could 
be produced rapidly and at a reasonable price. We acknowledge with pleasure 
the numerous remarks of the planning and production group of Springer Verlag 
concerning fonts, style and other questions of elegance. 


March, 1991 


The Authors 



VIII Preface 


Preface to the Second Edition 

The preparation of the second edition allowed us to improve the first edition by 
rewriting many sections and by eliminating errors and misprints which have been 
discovered. In particular we have included new material on 

- methods with extended stability (Chebyshev methods) (Sect. IV.2); 

- improved computer codes and new numerical tests for one- and multistep meth¬ 
ods (Sects. IV. 10 and V.5); 

- new results on properties of error growth functions (Sects. IV. 11 and IV. 12); 

- quasilinear differential equations with state-dependent mass matrix (Sect. VI.6). 
We have completely reorganized the chapter on differential-algebraic equations by 
including three new sections on 

- index reduction methods (Sect. VII.2); 

- half-explicit methods for index-2 systems (Sect. VII.6); 

- symplectic methods for constrained Hamiltonian systems and backward error 
analysis on manifolds (Sect. VII.8). 

Our sincere thanks go to many persons who have helped us with our work: 

- all readers who kindly drew our attention to several errors and misprints in the 
first edition, in particular C. Bendtsen, R. Chan, P. Chartier, T. Eirola, L. Jay, 
P. Kaps, J.-P. Kauthen, P. Leone, S. Maset, B. Owren, and L.F. Shampine; 

- those who read preliminary versions of the new parts of this edition for their in¬ 
valuable suggestions: M. Arnold, J. Cash, D.J. Higham, P. Kunkel, Chr. Lubich, 
A. Medovikov, A. Murua, A. Ostermann, and J. Verwer. 

- the staff of the Geneva computing center and of the mathematics library for their 
constant help; 

- the planning and production group of Springer-Verlag for numerous suggestions 
on presentation and style. 

All figures have been recomputed and printed, together with the text, in Postscript. 
All computations and text processings were done on the SUN workstations of the 
Mathematics Department of the University of Geneva. 

April 1996 


The Authors 
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Chapter IV. Stiff Problems - One-Step Methods 


This chapter introduces stiff (styv (Swedish first!), steif (German), stif (Islandic), 
stijf (Dutch), raide (French), rfgido (Spanish), rfgido (Portuguese), stiff (Italian), 
kankea (Finnish), Svcrnapirro (Greek), merev (Hungarian), rigid (Rumanian), 
tog (Slovenian), cvrst (Serbo-Croatian), tuhy (Czecho-Slovakian), sztywny (Pol¬ 
ish), jaik (Estonian), stiegrs (Latvian), standus (Lithuanian), stign (Breton), zurrun 
(Basque), sert (Turkish), >KecTKwu (Russian), TB^p# (Bulgarian), 

rr&jp (Hebrew), <3U (Arabic), (Urdu), Cl(Persian), 

4)fi u | (Sanscrit), cfl^I (Hindi), (Chinese), (Japanese), 

(Vietnamese), ngumu (Swaheli) ...) differential equations. While the in¬ 
tuitive meaning of stiff is clear to all specialists, much controversy is going on about 
it’s correct mathematical definition (see e.g. p.360-363 of Aiken (1985)). The most 
pragmatical opinion is also historically the first one (Curtiss & Hirschfelder 1952): 
stijf equations are equations where certain implicit methods, in particular BDF, 
perform better, usually tremendously better, than explicit ones. The eigenvalues of 
the Jacobian df/dy play certainly a role in this decision, but quantities such as the 
dimension of the system, the smoothness of the solution or the integration interval 
are also important (Sections IV. 1 and IV.2). 

Stiff equations need new concepts of stability (A-stability, Sect. IV.3) and lead 
to mathematical theories on order restrictions (order stars, Sect. IV.4). Stiff equa¬ 
tions require implicit methods; we therefore focus in Sections IV.5 and IV.6 on im¬ 
plicit Runge-Kutta methods, in IV.7 on (semi-implicit) Rosenbrock methods and in 
IV.9 on semi-implicit extrapolation methods. The actual efficient implementation 
of implicit Runge-Kutta methods poses a number of problems which are discussed 
in Sect. IV.8. Section IV. 10 then reports on some numerical experience for all these 
methods. 

With Sections IV. 11, IV. 12 and IV. 13 we begin with the discussion of contrac- 
tivity (B - stability) for linear and nonlinear differential equations. The chapter ends 
with questions of existence and numerical stability of the implicit Runge-Kutta so¬ 
lutions (Sect. IV. 14) and a convergence theory which is independent of the stiffness 
(B -convergence, Sect. IV. 15). 



IV.l Examples of Stiff Equations 


... Around 1960, things became completely different and every¬ 
one became aware that the world was full of stiff problems. 

(G. Dahlquist in Aiken 1985) 


Stiff equations are problems for which explicit methods don’t work. Curtiss & 
Hirschfelder (1952) explain stiffness on one-dimensional examples such as 

y' = — 50(y — cos a;). (1.1) 



Fig. 1.1. Solution curves of (1.1) Fig. 1.2. Explicit Euler for y(0) = 0, 

with implicit Euler solution h = 1.974/50 and 1.875/50 


Solution curves of Equation (1.1) are shown in Fig. 1.1. There is apparently a 
smooth solution in the vicinity of y & cos x and all other solutions reach this one 
after a rapid “transient phase”. Such transients are typical of stiff equations, but are 
neither sufficient nor necessary. For example, the solution with initial value y (0) = 
1 (more precisely 2500/2501) has no transient. Fig. 1.2 shows Euler polygons 
for the initial value y(0) = 0 and step sizes h — 1.974/50 (38 steps) and h = 
1.875/50 (40 steps). We observe that whenever the step size is a little too large 
(larger than 2/50), the numerical solution goes too far beyond the equilibrium and 
violent oscillations occur. 

Looking for better methods for differential equations such as (1.1), Curtiss 
and Hirschfelder discovered the BDF method (see Sect. III. 1): the approximation 
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y & cos x (i.e., f(x, y) — 0) is only a crude approximation to the smooth solution, 
since the derivative of cos x is not zero. It is much better, for a given solution 
value y n , to search for a point y n+1 where the slope of the vector field is directed 
towards y n , hence 

■ t/ - + - 1 fe - Vn =f( x n+ny n +i)- (1-2) 

This is the implicit Euler method. The dotted line in Fig. 1.1 consists of three 
implicit Euler steps and demonstrates impressively the good stability property of 
this method. Equation (1.1) is thus apparently “stiff” in the sense of Curtiss and 
Hirschfelder. 

Extending the above idea “by taking higher order polynomials to fit y at a large 
number of points” then leads to the BDF methods. 


Chemical Reaction Systems 

When the equations represent the behaviour of a system contain¬ 
ing a number of fast and slow reactions, a forward integration of 
these equations becomes difficult. (H.H. Robertson 1966) 

The following example of Robertson’s (1966) has become very popular in numeri¬ 
cal studies (Willoughby 1974): 

A B (slow) 

B + B ?' 10 -> C + B (very fast) (1.3) 

B + C A + C (fast) 

which leads to the equations 

A: y'i=~ 0.042/j + 10 4 j / 2 y 3 (0) = 1 

B: y' 2 = 0.04t/j — 10 4 y 2 y 3 — 3 • 10 7 ?/| y 2 (0) = 0 (1.4) 

C: 3 • 10 7 j/| y 3 (0) = 0. 

After a bad experience with explicit Euler just before, let’s try a higher order 
method and a more elaborate code for this example: DOPRI5 (cf. Volume 1). The 
numerical solutions obtained for y 2 with Rtol = 10 -2 (209 steps) as well as with 
Rtol = 10~ 3 (205 steps) and Atol = 10~ 6 Rtol are displayed in Fig. 1.3. Fig. 1.4 
presents the step sizes used by the code and also the local error estimates. There, 
all rejected steps are crossed out. 

We observe that the solution y 2 rapidly reaches a quasi-stationary position 
in the vicinity of y 2 = 0, which in the beginning (y x = 1, y 3 = 0) is at 0.04 « 
3 • I0 7 y 2 , hence y 2 « 3.65 • 10~ 5 , and then very slowly goes back to zero again. 
The numerical method, however, integrates this smooth solution by thousands of 
apparently unnecessary steps. Moreover, the chosen step sizes are more or less 
independent of the chosen tolerance. Hence, they seem to be governed by stability 
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Fig. 1.3. Numerical solution for problem (1.4) with DOPRI5 and RADAU5 



Fig. 1.4. Step sizes and local error estimates of DOPRI5, Tol = 10 2 


rather than by precision requirements. It can also be seen that an implicit Runge- 
Kutta code (such as RADAU5 described in Sections IV.5 and IV.8) integrates this 
equation without any problem. 


Electrical Circuits 


This behavior is known, at least in part, to any experienced worker 
in the field. (G. Hall 1985) 

One of the simplest nonlinear equations describing a circuit is van der Pol’s equa¬ 
tion (see Sect. 1.16) 

2/i=2/2 2/i(0) = 2 ^ 

V2 = K 1 ~ Vi)y2 - Vi J/ 2 (°) = 

We have seen in Chapter II that this equation is easily integrated for moderate 
values of jj ,. But we now choose \i = 500 and suspect that the problem might 
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Fig. 1.5. Numerical solution for DEABM at equation (1.5’), Rtol =10 2 , Atol = 10 7 




become difficult. It turns out that the period of the solution increases with fi. 
We therefore rescale the solutions and introduce t — x/fi , z x (t) = y 1 (x) , z 2 (t) = 
/jy 2 (x). In the resulting equation the factor fx 2 multiplies the entire second line of 
/. Substituting again y for z , x for t and jx 2 = 1 /e we obtain 


y [ =2/2 

2/2 =M 2 (( 1 -2/i)j/2 -Vi) 


Vi = 2/2 

£ 2/2 = (1 — 2 /?) 2/2 - 2 / i - 


or 


d-5’) 
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The steady-state approximation (see Vol. I, Formula (1.16.5)) then becomes inde¬ 
pendent of ji . 

Why not try a multistep code this time? For example the predictor-corrector 
Adams code DEABM of Shampine & Watts. Figures 1.5 and 1.6 show the numerical 
solution, the step sizes and the orders for the first 450 steps. Eventually the code 
stops with the message Idid = —4 (“the problem appears to be stiff”). The implicit 
Runge-Kutta code RADAU5 integrates over the same interval in 11 steps. 


Diffusion 


Stalling numerical processes must be wrong. 

(A “golden rule” of Achi Brandt) 


Another source of stiffness is the translation of diffusion terms by divided differ¬ 
ences (method of lines, see Sect. 1.1) into a large system of ODE’s. We choose the 
Brusselator (see (16.12) of Sect. 1.16) in one spatial variable x 


du 

dt 


y 

= A + u 2 v - (B + l)u + 


dv 2 d 2 v 

—=Bu~U V + a-^r 
ot ox 2 


with 0 < a: < 1, A = 1, B = 3, a = 1/50 and boundary conditions 


( 1 . 6 ) 


u(0, t) = u(l, t) = 1, u(0, t) = u(l,£) = 3, 

u(x, 0) = 1 + sin(27rx), v(x,0)=3. 

We replace the second spatial derivatives by finite differences on a grid of N points 
x i = i/(N + 1) (1 < i < N), Ax = 1/(N + 1) and obtain from (1.6) 

u'i = + -2 Ui + u i+1 ), 

»' = 3»,. - ~ 2»i + (1 . 6 .) 

M 0 (t) = U N+I {t) = 1, v 0 (t) = v N+1 (t) = 3, 


u,-(0) = 1 + sin(27rx-), v-(0) =3, i = 1,..., N. 


Table 1.1. Results for (1.6*) with ODEX for 0 < t < 10 


N 

Tol 

accepted steps 

rejected steps 

function calls 

10 

10 -4 

21 

3 

365 

20 

10- 4 

81 

25 

1138 

30 

10 -4 

167 

45 

2459 

40 

10" 4 

275 

62 

4316 

40 

i(r 2 

266 

59 

3810 
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0 

Fig. 1.7. Solution u(x , t) of (1.6’) with iV = 40 using ODEX 



0 5 10 0 5 10 



0 5 10 0 5 10 


Fig. 1.8. Step size and order of ODEX at (1.6’) with N = 40 

This time we try the extrapolation code ODEX (see Volume I) and integrate 
over 0 < t < 10 with Atol = Rtol = Tol. The number of necessary steps increases 
curiously with N, as is shown in Table 1.1. Again, for N large, the computing time 
is nearly independent of the desired tolerance, the computed solutions, however, 
differ considerably (see Fig. 1.7). Even the smooth 10~ 4 -solution shows curious 
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stripes which are evidently unconnected with the behaviour of the solution. Fig. 1.8 
shows the extremely ragged step size and order changes which take place in this 
example. 

We again have all the characteristics of a “stiff” problem, and the use of an 
implicit method promises better results. However, when applying such a method, 
one must carefully take advantage of the banded or sparse structure of the Jacobian 
matrix. Otherwise the numerical work involved in the linear algebra would increase 
with N 3 , precisely as the work for the explicit method (TV 2 for the number of steps 
and N for the work per step). 


A “Stiff” Beam 


Although it is common to talk about “stiff differential equations,” 
an equation per se is not stiff, a particular initial value problem 
for that equation may be stiff, in some regions, but the sizes of 
these regions depend on the initial values and the error tolerance. 

(C.W. Gear 1982) 


Let us conclude our series of examples 
by a problem from mechanics: the mo¬ 
tion of an elastic beam. We suppose 
the beam inextensible of length 1 and 
thin. So we neglect shearing forces 
and rotatory inertia. We further want 
to allow it arbitrarily large movements. 
Thus, the most natural coordinate sys¬ 
tem to use is the angle 6 as a func¬ 
tion of arc length 5 and time t . We 
further suppose the beam clamped at 
«s — 0 and a force F = (F x , F y ) acting 
at the free end 5 = 1. The beam is then 
described by the equations 



x(s,t) = / cos 0(<r, t )dcr, y(s,t) = 

Jo 

In order to obtain the equations of motion for this problem, we apply Lagrange 
theory (Lagrange 1788). This requires that we form L = T — U where T is the 
kinetic and U the potential energy. For the first of these we have simply 

r-ij[ ((z(M)) 2 + (y(M)) 2 )<k. 


/ 


16 ( 1 t, t)dcr. 


(1.7) 


(1.8) 
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The potential energy is made up of energy from bending (depending on the curva¬ 
ture) and from exterior forces as follows: 

Here dots and primes denote derivatives with respect to t and s respectively. The 
equations of motion are now obtained by a “trivial” calculation (we are grateful to 
our colleague J. Descloux for having shown us how this must be done!) using the 
Hamilton principle which leads to (see Exercise 2) 


f G(s , <t) cos (0(5, t) — 0(<r, *)) 0(<r, t)da 
Jo 

= 0"(s,*) H~ cos 0(s,t)F y (t ) — sin 0(s,t)F x (t) 

— ( G(s, a) sin (0(s, t) — 0(<r, *)) (0(<7, t)) 2 dcr, 
Jo 

0 ( 0 ,*) = 0 , 0 '(i,*) = 0 


where 


G?(s, a) = 1 — max(.s, <r) 


( 1 . 10 ) 

0<s<l 

( 1 . 11 ) 

(U2) 


is Green’s function for the problem —w f, (s) = g(s), w f ( 0) = tu(l) — 0. If we 
discretize the integrals with the help of the midpoint rule 


f 1 f(0(cr, t))d<j = !£/(**), 0 k = e({k - \)Kt), k = l,...,n 

U k =1 U 

(1.13) 


Equations (1.10) become 

n 

a lkh = ^ (°l -1 - 26 l + 6 l+l) + « 2 ( COS e i F y - sin °l F x) 


( 1 . 10 ’) 

~Y^9ik s ' m (°i- e kWh l = 1,... ,n 
k =1 

= ^n+l^n (1-11’) 

where 

a lk=9ik cos ( 0 i- 0 k)> 9ik = n +\~™- ax((,fc). (1.14) 


Integration without preparation is frustration. 

(Reverend Leon Sullivan) 

Numerical integration of (1.10’) seems quite tedious, since the acceleration 0 
is only given implicitly. The computation of 0 k requires the solution of a linear 
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system AO = v. Due to the special structure of A, this can be done efficiently, 
since with B = (b lk ), b lk = g lk sin (0 l — 0 k ), we have 


A + iB = dmg(e i 01 e^") G diag^e - ^ 1 e - ’*"). 
The matrix G — (g lk ) has the beautiful inverse 


G" 1 


f-‘ ■ 


V 


\ 


2 ~ l ) 

-1 3 / 


(1.15) 


(1.16) 


a positive definite tridiagonal matrix (a natural coincidence: G~ l represents the 
second order difference operator, and G comes from the Green function for a sec¬ 
ond order integration problem). Now 

(A + iB) -1 =C + iD= diag(e , ' 9l ,...,e i *")G -1 diag(e~ i( \. • •, 

and 

AC — BD = /, AD + BC = 0 (1.17) 

lead to A -1 = C + DC~ l D . We can also simplify the term — 9ik — 
0 k )0 k , which in vector notation is —BO 2 , with the formula A~ 1 B = — DC~ 1 
(from (1.17)). The accelerations 0 k are now obtained from (1.10’) as follows. 

a) Let v l = n 4 (^_j — 2 0 l + ^ +1 ) + n 2 (cos 0^ — sin 0\F X ), 

b) Compute w = Dv + 0 2 (D is bidiagonal); 

c) Solve the tridiagonal system Cu = w , 

d) Compute 0 = Cv + Du . 

Thus the evaluation of (1.10’) reduces to 0(n) operations (instead of 0(n 3 )). We 
choose the initial conditions 

<9(5,0) =0, 0($, 0)=0 (1.18) 

and apply the exterior forces 

F x = -<p(t), F v = <p(t), ^) = |j- 5 - sin2i (1-19) 


The resulting system of ODE’s is then integrated for 0 < t < 5 by the code DOP853 
of Volume I, although strictly speaking, the code is of too high an order for such a 
problem. The results are summarized in Table 1.2. 

We observe the same phenomenon as before, the number of necessary steps 
increases like 0 (n 2 ) (the numerical work like (9(n 3 )), and is more or less inde¬ 
pendent of the chosen tolerance. The numerical solution for n = 40 is displayed 
in Fig. 1.9. Only each 20th of the nearly 9000 steps is drawn (otherwise the picture 
would just be completely black). The computed solution looks perfectly smooth 
and there is no apparent reason for the need of so many steps. In fact due to lack 
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Table 1.2. Results for the beam (1.10’) with DOP853 


n 

Tol 

accepted steps 

rejected steps 

function calls 

5 

10- 7 

142 

35 

2091 

10 

10- 7 

383 

26 

4884 

20 

10- 7 

1397 

273 

19769 

40 

10- 7 

6913 

1347 

97775 

20 

10 -3 

1486 

450 

22784 

20 

10~ 5 

1967 

266 

26532 

20 

10- 7 

1397 

273 

19769 


of stability, the numerical method produces small vibrations which are invisible for 
Tol — 10“ 7 , and which force the integrator to such small step sizes. If we relax the 
high precision requirement, these oscillations become visible (Fig. 1.10). 


High Oscillations 


Let us now choose slightly perturbed initial values in the beam equation (1.10’). 
Instead of (1.18) we put 

*1 = ... = K- 1 = 0, e n = 0.4, e, =... = e n = o. (Lis’) 

This time, the correct solution for n = 10 of (1.10’) computed with Tol — 10 -6 
and more than 2000 steps is displayed in Fig. 1.11. 

The solution is highly oscillatory, no damping wipes out the fast vibrations 
since the system is conservative. Hence also an implicit method, if required to 
follow all these oscillations, would need the same number of steps and there would 
of course be no advantage in using it. So we see that the decision whether a problem 
should be regarded as stiff or nonstiff (“... that is the question”), may also depend 
on the chosen initial conditions. On the other hand, we shall see in Sect. IV.2 that 
whenever these high oscillations are not desired, implicit methods are a marvellous 
instrument for wiping them out. 


Exercises 

1. (Curtiss & Hirschfelder 1952). “It is interesting to notice that this method 
of integration (the implicit Euler) may be used in either direction”. Integrate 
equation (1.1) backward with step size —0.5 and initial value ?/(1.5) = 0 in 
three steps. Observe that the numerical solution remains stable and follows the 
smooth solution. 

2. Derive the equations of motion (1.10) for the elastic beam from (1.8) and (1.9). 
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Hint. If you want to avoid differentiation in function spaces, then discretize the 
beam as, say, 

3 3 x 

Xj = As^2cos9 k , y j = As'^2smO k , j = As =- (1.20) 

k=i k =1 n 


r=^E(*3+«S)- 


3 = 1 


As 

IT 


E 

3 = 1 


Z -Z 
3 3 ' 


: j= As J2' 




k=l 


As 




2 


{9j 


u- 

j= 1 


^ ■■ ■ ) 2 - F *A S J2 cos e k - FyAs ^2 sin 


k=1 


k =1 


form the Lagrange function L = T — U and apply n -dimensional Lagrange 
theory (Lagrange (1788), Vol. II, Sect. VII and VIII, a very clear derivation 
can be found in Sommerfeld (1942), Vol. I, §36) 


d / 0L\ 


0L 






00 


or 


— Le h t 


i=i 


-E l 

1=1 


8 k 8, Ol ■ 


( 1 . 21 ) 


3. Apply an explicit code to the Oregonator (Chapter I, Equation (16.15)) 
y[ = 77.27 (y 2 + Vl (l - 8.375 x KT 6 ^ - y 2 )) 

y ’ 2 = Y^(y 3 -(i + yi)y 2 ) 

y' 3 =0.161 (y 1 -y 3 ) 
and study its performance. 


4. a) Compute the equations of motion of the hanging rope (Fig. 1.12) of length 1 
by using the results of Exercise 2. The potential energy has to be replaced by 


U = - 



Result. 


[ G(s, a) cos (0(s, t) — 0(cr, £)) 0(cr, t)do 

Jo 

= — / G(s, a) sin(0(s, t) — 0(a, £)) (0(cr, t)) 2 dcr — (1 — s) sin 0(s, t) 

Jo 


(1.23) 
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for 0 < <s < 1, or, when discretized 


a ikh 


'ik 0 k- n { n + 2 ~ l ) sin0 i • 


(1.23’) 


b) Do numerical computations with DOPRI5 or DOP853. Choose as initial po¬ 
sition a hanging rope in equilibrium which is then released at one end. 

Hint. The hanging rope in equilibrium satisfies, in the usual coordinates, 

rx i _ f£i 

/ y\J 1 + (y') 2 dx = min with / y/l + ( y') 2 dx = 1, 

J Xq J Xq 

which, using a Lagrange multiplier, becomes 

f (y - X)y/l+lyydx = stat. 

^ Xq 

Applying (2.6) of Sect. 1.2 yields y — A = Ky/l + ( y f ) 2 with solution 
y = \ +Kcosh(^-j^y 

Suitable choices of the parameters and change of coordinates (K = 1 /2, A = 
— Kcosh(a/K), x -4 y , y -4 — x) then lead to 

0(6,0) = 7t/ 2 — arctan(sinh(2a) — 25 ). (1-24) 

Result. DOP853 has computed the solution for 0 < t < 5, n = 60 and Tol = 
10 -5 , a = 0.6, in 203 steps (Fig. 1.12). The number of steps increases here 
like 0(n) , so the rope is — evidently — less stiff than the beam. 



Fig. I.I2. Movement of hanging rope, every step drawn 
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... werden wir bei dem Anfangswertproblem hyperbolischer Glei- 
chungen erkennen, dass die Konvergenz allgemein nur dann vor- 
handen ist, wenn die Verhaltnisse der Gittermaschen in verschie- 
denen Richtungen gewissen Ungleichungen geniigen. 

(Courant, Friedrichs & Lewy 1928) 


The first analysis of instability phenomena and step size restrictions for hyperbolic 
equations was made in the famous paper of Courant, Friedrichs & Lewy (1928). 
Later, many authors undertook a stability analysis, very often independently, in 
order to explain the phenomena encountered in the foregoing section. An early and 
beautiful paper on this subject is Guillou & Lago (1961). 


Stability Analysis for Euler’s Method 

Let <p(x) be a smooth solution of y' = f(x. y ). We linearize / in its neighbour¬ 
hood as follows 

y'(x) = f(x, ip(x)) + (x, ip(x)) (y(x) - tp(x)) + ... (2.1) 

and introduce y(x) — <p(x) = y(x) to obtain 

y'{x) = -y(x) + ... =J(x)y(x) + ... . (2.2) 

As a first approximation we consider the Jacobian J(x) as constant and neglect the 
error terms. Omitting the bars we arrive at 

y' = Jy- ( 2 . 2 ’) 


If we now apply, say, Euler’s method to (2.2’), we obtain 

y m +1 = R ( hJ )y m ( 2 - 3 ) 

with 

R(z) = \ + z. (2.4) 

The behaviour of (2.3) is studied by transforming J to Jordan canonical form (see 
Sect. 1.12). We suppose that J is diagonalizable with eigenvectors ,..., v n and 
write y 0 in this basis as 

n 

y 0 = L a i V i ■ 

2=1 


(2.5) 
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Inserting this into (2.3) we obtain 

n 

y m = (2-6) 

i= 1 

where the A • are the corresponding eigenvalues (see also Exercises 1 and 2). Clear¬ 
ly y m remains bounded for m -» oo, if for all eigenvalues the complex number 
z = h\ i lies in the set 

5 = {z G C; |JJ(z)| < l} = {z € C; \z — (-1)| < l} 

which is the circle of radius 1 and centre — 1. This leads to the explanation of the 
results encountered in Example (1.1). There we have A = — 50, and h\e S means 
that 0 < h < 2/50, in perfect accordance with the numerical observations. 


Explicit Rimge-Kutta Methods 


An explicit Runge-Kutta method (Sect. II.2, Formula (2.3)) applied to (2.2’) gives 


= y m + h J'52 a ij9 i 


J=1 


(2.7) 


y m +i = ym+ h Jj2 h i 9 i ■ 

3 = 1 

Inserting g • repeatedly from the first line, this becomes 


where 


y m +1 = R ( h J)y m 

R(z ) = 1 + 2 b 3 + 02 Y, b 3 a jk +z3 Ys b 3 a jk a kl + ■■■ ( 2 . 8 ) 

3 j,k,l 


is a polynomial of degree < 5 . 

Definition 2.1. The function R(z) is called the stability function of the method. It 
can be interpreted as the numerical solution after one step for 

y' = *y, y 0 = 1 > z = h\, (2.9) 

the famous Dahlquist test equation. The set 

5={zeC; |iJ(z)|<l} (2.10) 

is called the stability domain of the method. 


Theorem 2.2. If the Runge-Kutta method is of order p, then 

R(z) = l + z+^ + ...+ Z ^+0(z^). 
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Proof. The exact solution of (2.9) is e z and therefore the numerical solution y 1 = 
R(z) must satisfy 

e g - R(z) = OibP +1 ) - 0{z p+1 ) . (2.11) 

Another argument is that the expressions in (2.8) appear in the order conditions for 
the “tall” trees T,f 21 ,f 32 ,i 44 ,f 59 ,... (see Table 2.2 of Sect.II.2, p. 148). They are 
therefore equal to l/q\ for q < p. □ 




Fig. 2.1. Stability domains for explicit Fig. 2.2. Stability domains for 

Runge-Kutta methods of order p — s DOPRI methods 


As a consequence, all explicit Runge-Kutta methods with p = s possess the 
stability function 

R{ z ) = ( 2 . 12 ) 

The corresponding stability domains are represented in Fig. 2.1. 

The method of Dormand & Prince DOPRI5 (Sect. II.5, Table 5.2) is of order 5 
with s = 6 (the 7th stage is for error estimation only). Here R(z) is obtained by 
direct computation. The result is 
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ForDOP853 (Sect. II.5, Fig. 5.3), R(z) becomes 
8 zJ 

R{z ) = — + 2.6916922001691 • 10~ 6 z 9 + 2.3413451082098 • 10 _ V° 

i =o J ' 


+ 1.4947364854592 • 10 _8 z n + 3.6133245781282 • 10 _1O 2 12 . 

(2.14, 

The stability domains for these two methods are given in Fig. 2.2. 


Extrapolation Methods 


The GBS-algorithm (see Sect.II.9, Formulas (9.10), (9.13)) applied to y' = \y, 
t/(0) = 1 leads with z — H\ to 


Vo = Vi = 1 + 


y i+ i = 2 /«- i +2 — 2 /,' 


* — 1) 2 ,..., n ■ 


Tjl ^(2/tjj-l + ^Vrij + Vtij + l) 

rp _ rp , 

j ' k+1 ~ hk (nj/n^-1- 


(2.15) 


The stability domains for the diagonal terms T 22 , T 33 , T 44 , and T 55 
monic sequence 


{^> = {2,4,6,8,10,...} 


for the har- 


(the one which is used in ODEX) are displayed in Fig. 2.3. We have also added 
those for the methods without the smoothing step (II.9.13c), which shows some 
difference for negative real eigenvalues. 


Analysis of the Examples of IV. 1 

The Jacobian for the Robertson reaction (1.3) is given by 

/ -0.04 10 4 j/ 3 10 4 j/ 2 \ 

0.04 —10 4 y 3 — 6 • 10 7 j/ 2 -10 4 y 2 j 

\ 0 6 • 10 7 2/ 2 0 j 

which in the neighbourhood of the equilibrium y 1 = 1, y 2 = 0.0000365, j/ 3 = 0 is 

/ —0.04 0 0.365 \ 

0.04 -2190 -0.365 
\ 0 2190 0 ) 


with eigenvalues 




with smoothing step without smoothing step 

Fig. 2.3. Stability domains for GBS extrapolation methods 



( 2 . 18 ) 
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and therefore the double eigenvalues of the right hand matrix in (2.16) are 


4 a / . 7 rk \ 2 

(K^yv m WT2J 


-4 a (N + l?( e in^^)\ 


(2.19) 


and are located between —4a(N + l) 2 and 0. Since this matrix is symmetric, 
its eigenvalues are well conditioned and the first matrix on the right side of (2.16) 
with much smaller coefficients can be regarded as a small perturbation. There¬ 
fore the eigenvalues of J in (2.16) will remain close to those of the unperturbed 
matrix and lie in a stripe neighbouring the interval [—4a(iV + l) 2 ,0]. Numeri¬ 
cal computations for N = 40 show for example that the largest negative eigen¬ 
value of J varies between —133.3 and —134.9, while the unperturbed value is 
—4 • 41 2 • sin 2 (407r/82)/50 = —134.28. Since most stability domains for ODEX 
end close to —5.5 on the real axis (Fig. 2.3), this leads for N = 40 to h < 0.04 and 
the number of steps must be > 250 (compare with Table 1.1). 


In order to explain the behaviour of the beam equation, we linearize it in the 
neighbourhood of the solution 0 k = 0 k = 0, F x = F y = 0. There (1.10’) becomes 


1 

(-2 1 

1 -2 1 

\ 



e 

II 

o 

1 


o, 

(2.20) 


l 

-2 1 

1 -l) 




since for 0 = 0 we have A = G and B = 0. We now insert G _1 from (1.16) and 
observe that the matrices involved are, with the exception of two elements, equal 
to ±K of (2.17). We therefore approximate (2.20) by 

6 = -n 4 iVT 2 6». (2.21) 


This second order equation was integrated in IV. 1 as a first order system 

(2)=UV 


( 2 . 22 ) 


By solving 


0 I 
-n*K 2 0 


im: 


(2.23) 


we find that A is an eigenvalue of E iff A 2 is an eigenvalue of —n A K 2 . Thus 
Formula (2.18) shows that the eigenvalues of E are situated on the imaginary axis 
between —4 n 2 i and +4 n 2 i. We see from Fig. 2.2 that the stability domain of 
DOP853 covers the imaginary axis between approximately — 6z and +6z. Hence 
for stability we need h < 1.5 /n 2 and the number of steps for the interval 0 < 
t < 5 must be larger than « 10n 2 /3. This, again, was observed in the numerical 
calculations (Table 1.2). 



IV.2 Stability Analysis for Explicit RK Methods 


21 


Automatic Stiffness Detection 

Neither is perfect, but even an imperfect test can be quite useful, 
as we can show from experience ... (L.F. Shampine 1977) 


Explicit codes applied to stiff problems are apparently not very efficient and the 
remaining part of the book will be devoted to the construction of more stable algo¬ 
rithms. In order to avoid that an explicit code waste too much effort when encoun¬ 
tering stiffness (and to enable a switch to a more suitable method), it is important 
that the code be equipped with a cheap means of detecting stiffness. The analysis 
of the preceding subsection demonstrates that, whenever a nonstiff code encounters 
stiffness, the product of the step size with the dominant eigenvalue of the Jacobian 
lies near the border of the stability domain. We shall show two manners of exploit¬ 
ing this observation to detect stiffness. 

Firstly, we adapt the ideas of Shampine & Hiebert (1977) to the Dormand & 
Prince method of order 5(4), given in Table II.5.2. The method possesses an er¬ 
ror estimator err 1 = y 1 —y 1 which, in the nonstiff situation, is 0(h 5 ). However 
in the stiff case, when the method is working near the border of the stability do¬ 
main 5, the distance d 1 =y 1 — y(x Q + h) to the smooth solution is approximately 
d 1 « R(h J)d 0 , where J denotes the Jacobian of the system, R(z) is the stability 
function of the method, and d 0 = y 0 — y(x Q ) . Here we have neglected the local 
error for an initial value on the smooth solution y(x). A similar formula, with 
R replaced by R, holds for the embedded method. The error estimator satisfies 
err 1 « E(h J)d 0 with E(z) = R(z) — R(z) . The idea is now to search for a sec¬ 
ond error estimator err 1 (with err 1 « E(h J)d 0 ) such that 

i) \E(z)\ < 0\E(z)\ on dS 0 C - with a small 0 < 1; 

ii) err x — 0(h 2 ) for h -» 0. 

Condition (i) implies that ||m* 1 || < \\err x \\ when hX is near dS (the problem 
is possibly stiff), and condition (ii) will lead to HerrJI HerrJI for step sizes 
which are determined by accuracy requirements (when the problem is not stiff). If 
\\efr 1 || < \\err 1 1| occurs several times in succession (say 15 times) then a stiff code 
might be more efficient. 

For the construction of err 1 we put efr 1 = h(d 1 k x + d 2 k 2 +... + d s k s ), where 
the k • = f(x Q + c -fr, g { ) are the available function values of the method. The coef¬ 
ficients d i are determined in such a way that 


= 0’ X>« c « = 0 - 02 (2-24) 

2=1 2 = 1 

(so that (ii) holds) and that 0 in (i) is minimized. A computer search gave values 
which have been rounded to 

d, = -0.08536, d 2 = 0.088, d- = -0.0096, 

1 2 3 ( 2<25 ) 

d 4 = 0.0052, d 5 = 0.00576, d 6 = -0.004. 

The factor 0.02 in (2.24) has been chosen such that 0 in (i) is close to 0.3 on 
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large parts of the border of 5, but \E(z)/E(z)\ soon becomes larger than 1 if z 
approaches the origin. 

In Fig. 2.4 we present the contour lines | E(z)/E(z) | = Const (Const = 4,2,1, 
0.5,0.3,0.2,0.14,0.1) together with the stability domain of the method. A nu¬ 
merical experiment is illustrated in Fig. 2.5. We applied the code DOPRI5 (see the 
Appendix to Volume I) to the van der Pol equation (1.5’) with e — 0.003. The upper 
picture shows the first component of the solution, the second picture displays the 
quotient H^ill/H^ill for the three tolerances Tol = 10“ 3 ,10~ 5 ,10 -7 . The last 
picture is a plot of h\X\/3.3 where h is the current step size and A the dominant 
eigenvalue of the Jacobian and 3.3 is the approximate distance of dS to the origin. 



A second possibility for detecting stiffness is to estimate directly the dominant 
eigenvalue of the Jacobian of the problem. If v denotes an approximation to the 
corresponding eigenvector with \\v\\ sufficiently small then, by the mean value 
theorem, 


m., \\f(x,y + v)-f(x,y)\\ 
l A l-iOi “ 


will be a good approximation to the leading eigenvalue. For the Dormand & Prince 
method (Table II.5.2) we have c 6 = c 7 = 1. Therefore, a natural choice is 


P7 ~ Ml 

ll#7 ~ 0611 


(2.26) 


where — f(x Q + c- /i, g i ) are the function values of the current step. Both values, 
g 7 = y 1 and g 6 , approximate the exact solution y(x 0 + h) and it can be shown by 
Taylor expansion that g 7 — g 6 = 0(h 3 ). This difference is thus sufficiently small, 
in general. The same argument also shows that g 7 — g 6 = E(hJ)d 0 , where J is 
the Jacobian of the linearized differential equation and E(z) is a polynomial with 
subdegree 4. Hence, g 7 — g 6 is essentially the vector obtained by 4 iterations of 
the power method applied to the matrix hJ . It will be a good approximation to the 
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■ he/ 3.3 l 

Tol = 10 

Tol =10- 5 / | 
Tol = 10- 3 



Fig. 2.6. Estimation of Lipschitz constant with DOPRI5 


eigenvector corresponding to the leading eigenvalue. As in the above numerical 
experiment we applied the code DOPRI5 to the van der Pol equation (1.5’) with 
e = 0.003. Fig. 2.6 presents a plot of fig/ 3.3 where h is the current step size and 
q the estimate (2.26). This is in perfect agreement with the exact values h |A|/3.3 
(see third picture of Fig. 2.5). 

Further numerical examples have shown that the estimate (2.26) also gives 
satisfactory approximations of |A| when the dominant eigenvalue A is complex. 
However, if the argument of A is needed too, one can extend the power method 
as proposed by Wilkinson (1965, page 579). This has been elaborated by Sottas 
(1984) and Robertson (1987). 

The two techniques above allow us to detect the regions where the step size 
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is restricted by stability. In order to decide whether a stiff integrator will be more 
efficient, one has to compare the expense of both methods. Studies on this question 
have been undertaken in Petzold (1983), Sottas (1984) and Butcher (1990). 


Step-Control Stability 


We now come to the explanation of another phenomenon encountered in Sect. IV. 1, 
that of the ragged behaviour of the step size (e.g. Fig. 1.4 or 1. 8 ), a research initiated 
by G. Hall (1985/86) and continued by G. Hall & D.J. Higham (1988). Do there 
exist methods or stiff equations for which the step sizes h n behave smoothly and 
no frequent step rejections appear? 

We make a numerical study on the equation 

y\ = -2000 ( cos x ■y 1 + sinx • t / 2 + 1 ) ^( 0)^1 

y' 2 = —2000 (— sin a: • y 1 + cos x ■ y 2 + 1 ) j/ 2 ( 0 ) = 0 


for 0 < x < 1.57, whose eigenvalues move slowly on a large circle from —2000 
to ±2000z. If we apply Fehlberg’s method RKF5(4) (Table II.5.1) in local extrap¬ 
olation mode (i.e., we continue the integration with the higher order result) and 
DOPRI5 to this equation (with Euclidean error norm without scaling), we obtain 
the step size behaviour presented in Fig. 2.7. There all rejected steps are crossed 
out (3 rejected steps for RKF5(4) and 104 for DOPRI5. 

In order to explain this behaviour, we consider for y f = A y (of course!) the 
numerical process 

y n +i = R( h n x )y n 


err n = E iK x )y n 



(2.28) 


(where err n is the estimated error, E(z) = R(z) — R(z ), a = l/(p + 1 ) and p is 
the order of R) as a dynamical system whose fixed points and stability we have to 
study. A possible safety factor (“ fac ” of formula (4.13) of Sect. II.4) can easily be 
incorporated into Tol and does not affect the theory. The analysis simplifies if we 
introduce logarithms 


Vn =1 °g|2/nl> X n = l °SK 

so that (2.28) becomes 

Vn+i =log\R(e Xn X)\ + ri n , 

Xn+i =a(7-l°g|-E(e Xn A)|-?7 n ) +x„, 
where 7 is a constant. This is now a map R 2 —> R 2 . Its fixed point (r/, x) satisfies 

|I?(e x A)| = 1 , (2.31) 


(2.29) 

(2.30) 
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,5 1.0 1.5 

Fig. 2.7. Step sizes of RKF5(4) and DOPRI5 for (2.27) 


which determines the step size e x so that the point z = e x \ must be on the border 
of the stability domain. Further 

1 = 7 — log \E( Z )\ 



Proposition 2.3. The step-control mechanism is stable for h\ = z on the boundary 
of the stability domain if and only if the spectral radius of C in (2.32) satisfies 

0(C) <1. 

We then call the method SC -stable at z. □ 


The matrix C is independent of the given differential equation and of the given 
tolerance. It is therefore a characteristic of the numerical method and the boundary 
of its stability domain. Let us study some methods of Sect. II.5. 
a) RKF4(5) (Table 5.1), a = 1/5: 



(2.33) 
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Fig. 2.8. Regions of step-control stability 


b) DOPRI5 (Table 5.2), a - 1/5: 

Q7 1 Q 1 

R(z) — see (2.13), E(z ) =- z^ - z® -]- z^ 

y } . ; 1 ' 120000 40000 24000 

c) RKF5(4) (Table 5.1, with local extrapolation), a = 1/5: 


y2 ^3 ~4 ~5 ^6 

fl W = 1 + *+2 + 6 + 2l + l20 + 2080 


£(z) same as (2.33). 


d) HIHA5 (Method of Higham & Hall, see Table 2.1 below), a = 1 /5: 

. z 2 z 3 z 4 z 5 z 6 

R( z ) = l + z + - + - + - + — + —, 

E ^ ~~12W Z + 2400 2 + 14400^ 


The corresponding stability domains are represented in Fig. 2.8. There, the re¬ 
gions of the boundary, for which q{C) < 1 is satisfied, are represented as thick 
lines. It can be observed that the phenomena of Fig. 2.7, as well as those of 
Sect. IV. 1, are nicely verified. 


DOP853. The step size control of the code DOP853 (Volume I) is slightly more 
complicated. It is based on a “stretched” error estimator (see Sect. II. 10) and, for 
the test equation y' — \y, it is equivalent to replacing \E(z)\ of (2.30) by 


i£(*)i = 


|fl 5 (*)i 2 

\E 5 ( z )\i+0M-\E 3 (z)\i 


where E 3 (z) = R s (z) — R(z), E 5 (z) = R 5 (z) — R(z), and R 3 (z),R 5 (z) are the 

stability functions of third and fifth order embedded methods, respectively. The 
above analysis is still valid if the expression v of (2.32) is replaced by the derivative 
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Table 2.1. Method HIHA5 of Higham and Hall 
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91 

27 
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27 

12 
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1 
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0 
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4 

125 

5 

0 

12 

32 

3 

96 

48 
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0 

27 

2 

25 

1 

1 

y i 

15 

80 

”15 

48 

24 

10 


of log \E(ex\)\ with respect to y, which is 

v 5 \E 5 {z)\ 2 +0.01v 3 \E 3 (z)\ 2 
5 \E 5 (z)\*+0M\E 3 (z)\i 


(2.38) 


where v 5 = Re (zE' 5 (z)/E 5 (z)) and v 3 = Re (zE' z (z)/E 3 (z)). Since |.E(;j)| = 
0(\z\ s ) for \z\ —>■ 0, we have to use the value a = 1/8 in (2.32). The regions of 
SC -stability are shown in Fig. 2.8. 


SC-Stable Dormand and Prince Pairs of Order 5. We see from Fig. 2.8 that 
the method DOPRI5 is not SC-stable at the intersection of the real axis with the 
boundary of the stability region. We are therefore interested in finding 5(4)-th order 
explicit Runge-Kutta pairs from the family of Dormand & Prince (1980) with larger 
regions of SC -stability. 

Requiring the simplifying assumption (II.5.15), Algorithm 5.2 of Sect. II.5 
yields a class of Runge-Kutta methods with c 3 , c 4 , c 5 as free parameters. Higham 
& Hall (1990) have made an extensive computer search for good choices of these 
parameters in order to have a reasonable size of the stability domain, large parts 
of SC -stability and a small 6 th order error constant. It turned out that the larger 
one wants the region of SC -stability, the larger the error constant becomes. A 
compromise choice between Scylla and Charybdis, which in addition yields nice 
rational coefficients, is given by c 3 = 1/3, c 4 = 1/2 and c 5 = 3/5. This then leads 
to the method of Table 2.1 which has satisfactory stability properties as can be seen 
from Fig. 2.8. 
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A PI Step Size Control 

We saw that it was an I-controler ... and a control-man knows 
that PI is always better than I ... 

(K. Gustafsson, June 1990) 

In 1986/87 two students of control theory attended a course of numerical analysis 
at the University of Lund. The outcome of this contact was the idea to resolve the 
above instability phenomena in stiff computations by using the concept of “PID 
control” (Gustafsson, Lundh & Soderlind 1988). The motivation for PID control, a 
classic in control theory (Callender, Hartree & Porter 1936) is as follows: 

Suppose we have a continuous-time control problem where 9(t) is the depar¬ 
ture, at time t, of a quantity to be controlled from its normal value. Then one might 
suppose that 

0(t) = C(t) — mO(t) (2.39) 

where C(t) denotes the effect of the control and the term —m6(f) represents a 
self-regulating effect such as “a vessel in a constant temperature bath”. The most 
simple assumption for the control would be 

-C(t) = n 1 0(t) (2.40) 

which represents, say, a valve opened or closed in dependence of 6 . The equations 
(2.39) and (2.40) together lead to 

0 + m6 + n 1 0 = O (2.41) 

which, for n 1 >0, m > 0, is always stable. If, however, we assume (more realis¬ 
tically) that our system has some time-lag, we must replace (2.40) by 

-C(t) = n x 0(t-T) (2.40’) 

and the stability of the process may be destroyed. This is precisely the same effect 
as the instability of Equation (17.6) of Sect. 11.17 and is discussed similarly. In 
order to preserve stability, one might replace (2.40’) by 

-C(t)=n 1 0(t-T) + n 2 O(t-T) (2.40”) 

or even by 

-C{t) = n^tt - T) + n 2 0(t - T) + nj(t - T). (2.40”’) 

Here, the first term on the right hand side represents the “Integral feedback” (I), 
the second term “Proportional feedback” (P) and the last term is the “Derivative 
feedback” ( D ). The P-term especially increases the constant m in (2.41), thus 
adds extra friction to the equation. It is thus natural to expect that the system 
becomes more stable. The precise tuning of the parameters n 1 , n 2 , n 3 is, however, 
a long task of analytic study and practical experience. 

In order to adapt the continuous-time model (2.40”) to our situation, we replace 

C(t) < — > log h n (the “control variable”) 

0(t) i—y log | err n \ — log Tol (the “deviation”) 
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and replace derivatives in t by differences. Then the formula (see (2.28)) 


h n+l =h„' 


Tol \ ni 

\ err n\) 


which is 

-(log h n+1 - log h n ) = rij(log \err n \ - log Tol), 


corresponds to (2.40’). The PI -control (2.40”) would read 


-(log/i n+1 - log h n ) = n 1 (log \err n \ — log Tol ) 

+ n 2 ((log | err n \ - log Tol) - (log | err n _ x 

or when resolved, 


K+i = K 


Tol \ ni 

err n\) 


Kn-xl V 2 

\ err n I / 


log Tol)), 


(2.42) 


In order to perform a theoretical analysis of this new algorithm we again 
choose the problem y f = A y and have as in (2.28) 


y n+1 = R(h n \)y n 
err n = E{h n \)y n 
Tol 


h 


n+1 


: 


\err„ 


l^n-lI V 3 

\err n I ) 



\ err n-i\ V 

Tol ) 


(2.43a) 

(2.43b) 


(2.43c) 


where a = n 1 +n 2 , (3 = n 2 . With the notation (2.29) this process becomes 


V „+1 =log|-R(e x "A)|+r 7 „ 

Xn+i = Xn - “log \E{e Xn X)\ - ar) n + f3 log \E(e Xn ~ l A)| + (3v n -i+ 7 

with some constant 7 . This can be considered as a map h n ,X„ 5 Vi)X n - 1 ) 
->■ {v„+i,X„+i,V„,X„) ■ At a fixed point ( 77 , x), which again satisfies (2.31), the 
Jacobian is given by 


q _ ^(^n+l ’ A^n+1’ ^Im Xn) 


1 u 0 0 

—a 1 — av (3 f3v 
1 0 0 0 

0 10 0 


(2.45) 


with u and v as in (2.32). A numerical study of the spectral radius g(C) with a = 
1/p (where p is the exponent of h of the leading term in the error estimator), f3 = 
0.08 along the boundary of the stability domains of the above RK-methods shows 
an impressive improvement (see Fig. 2.9) as compared to the standard algorithm of 
Fig. 2.8. The only exception is DOP853, which becomes unstable close to the real 
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DOP853 


Fig. 2.9. Regions of step-control stability with stabilization factor /3 = 0.08 


axis, whereas it was SC -stable for (3 — 0. For this method, the value /? = 0.04 is 
more suitable. 

The step size behaviour of DOPRI5 with the new strategy (/? = 0.13) applied 
to the problem (1.6’) is compared in Fig. 2.10 to the undamped step size control 
(/? = 0). The improvement needs no comment. In order to make the difference 
clearly visible, we have chosen an extra-large tolerance Atol = Rtol — 8 • 10~ 2 . 
With ^ — 0.13 the numerical solution becomes smooth in the time-direction. The 
zig-zag error in the x -direction represents the eigenvector corresponding to the 
largest eigenvalue of the Jacobian and its magnitude is below Atol. 

Man sieht dass selbst der frommste Mann 
nicht alien Leuten gefallen kann. 

(W. Busch, Kritik des Herzens 1874) 


Study for small h . For the non-stiff case the new step size strategy may be slightly 
less efficient. In order to understand this, we assume that \err n \ « Ch p n so that 
(2.43c) becomes 


°n +1 


f my fchi^ 
n \Ch P nJ \ Tol 


(2.46) 


or, by taking logarithms, 

log K+i + ( P a -!) lo s K - pP 1o § K-i = («-/?) lo s (^t) • 

This is a linear difference equation with characteristic equation 

A 2 + (pa — 1)A —p/3 = 0, (2.47) 


the roots of which govern the response of the system to variations in C . Obviously, 
the choice a = l/p and (3 = 0 would be most perfect by making both roots equal to 
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x 


without stabilisation (/3 = 0) with stabilisation (/3 = 0.13) 

294 steps, 212 accepted, 82 rejected 162 steps, 162 accepted, 0 rejected 

Fig. 2.10. Numerical solution of (1.6’) with 7b/ = 8 • 10~ 2 


zero; but this is just the classical step size control. We therefore have to compromise 
by choosing a and (3 such that (2.45) remains stable for large parts of the stability 
boundary and at the same time keeping the roots of (2.47) significantly smaller 
than one. A fairly good choice, found by Gustafsson (1991) after some numerical 
computations, is 

a«0.7/p, (3 «0.4/p. (2.48) 


Stabilized Explicit Runge-Kutta Methods 


For many problems, usually not very stiff, of large dimension, and with eigenvalues 
known to lie in a certain region, explicit methods with large stability domains can 
be very efficient. We consider here methods with extended stability domains along 
the negative real axis, which are, therefore, especially suited for the time integration 
of systems of parabolic PDEs. An excellent survey article with additional details 
and references is Verwer (1996). 

Our problem is to find, for a given 5, a polynomial of the form R(z) = 1-Fz + 
a 2 z 2 + ... + a s z 3 such that the corresponding stability domain is, in the direction 
of the negative axis, as large as possible. The main ingredient for these methods 
are the Chebyshev polynomials (Chebyshev 1854) 

T s (x) = cos(s arccos x) (2.49) 

or 


T s (x) = 2xT s _ 1 (x) - T s _ 2 (x), 


T 0 (a;) = l, T 1 {x) = x 


(2.49’) 
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Fig. 2.11. Shifted Chebyshev polynomial Tg(l + z/81) and its zeros 



Fig. 2.12. Stability domains for shifted Chebyshev polynomials (s = 2, 3, 4, 5) 
(dots represent limiting case s —)■ 00 , see Exercise 8 below) 



Fig. 2.13. Stability domains for damped Chebyshev stability functions, e = 0.05 


which remain for — 1 < x < 1 between —1 and +1 and among these polynomials 
have the largest possible derivative T s '(l) = s 2 (A.A. Markov 1890). Therefore, 
one must set (Saul’ev 1960, Saul’ev’s postgraduate student Yuan Chzao Din 1958, 
Franklin 1959, Guillou & Lago 1961) 

R s (z) = T s (l + z/s 2 ) (2.50) 

so that R s ( 0) = 1, jR'(0) = l,and \R s (z)\ < 1 for -2s 2 < z < 0 (see Fig. 2.11). 
In particular we have 

R^z) = 1 + z 
R 2 {z) = l + z+\z 2 

R 3 (z) ^l + z + ^z 2 + ^z 3 (2.50’) 

Ri( z ) = 1 + z + J2 z2 + 128 ^ + 8192 z4: 

R-5( Z ) = 1 + ^ + 25 z2 + 3125^ 3 78125 Z ^ + 97 ( 15(525 • 

whose stability domains are represented in Fig. 2.12. 
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Damping. In the points where T s (1 + zjs 2 ) = ± 1, there is no damping at all of the 
higher frequencies and the stability domain has zero width. We therefore choose a 
small e > 0, say e — 0.05, and put (already suggested by Guillou & Lago 1961) 




T 3 K) 


T s (w 0 + w lZ ), w 0 = l + ' 


T s (i 


Z> 0 ) 


(2.51) 


These polynomials oscillate between approximately 1 — e and — 1 + e and again 
satisfy R s (z) = 1 + z + 0(z 2 ). The stability domains become a bit shorter (by 
(4e/3 )s 2 ), but the boundary is in a safe distance from the real axis (see Fig. 2.13). 


Lebedev’s Realization. Our next problem is to find Runge-Kutta methods which 
realize these stability polynomials. A first idea, mentioned by Saul’ev (1960) and 
Guillou & Lago (1961), is to write 

s 1 

R s (z ) = J]i (1 + & { z) where =-, z i roots of R(z) (2.52) 

.=i 

and to represent the RK method as the composition of explicit Euler steps 
9o-=Vo > 9i -=9i-i +hS i f{g i _ 1 ), (* = 1 , 2 ,..., s), y 1 := g s . (2.53) 

A disadvantage here is the fact that for the first of these roots, which in absolute 
value is much smaller than the others, we shall have a very large Euler step, which 
is surely not good. Lebedev’s idea (Lebedev 1989, 1994) is therefore to group 
the roots symmetrically two-by-two together and to represent the corresponding 
quadratic factor 

(1 + S i z)(l + S' i z) = (l + 2a i z + (3 i z 2 ) 

by a two-stage scheme 

9 *+1 ■ = 9 i + ha i f{g i ) 

9i+i-= 9 *+1 - (/(&) - /(&_!)) 

= 9*+i - 7i ((s*+i - 9i) ~ ( 9i ~ 9i- 1 )) 

which produces (2.54) if (3 i = a 2 ( 1 — 7 -). This halves nearly the largest Euler step 
size and allows also complex conjugate pairs of roots. The expression (^ +1 —g^ — 
{di — 9i~i ) ^ a \ V" can t> e use( i f° r error estimations and step size selections. For 
odd s , there remains one single root which gives rise to an Euler step (2.53). 

Best Ordering. Some attention is now necessary for the decision in which order 
the roots shall be used (Lebedev & Finogenov 1976). This is done by two require¬ 
ments: firstly, the quantities 

3 

$j = max I 1 + <^1 If I 1 + 2 a i 0 +/?i z2 |> 


(2.54) 


(2.55) 
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which express the stability of the internal stages, must be < 1 (here, the max is 
taken over real 2 in the stability interval of the method). Secondly, the quantities 

s 

Qj = max \l + 2a i z + /3 i z 2 \, 
i=j +1 

which describe the propagation of rounding errors, must be as small as possible. 
These conditions, evaluated numerically for the case 5 = 9, lead to the ordering 
indicated in Fig. 2.11. 

Second Order Methods. If the stability polynomial is a second order approximation 
to e z , i.e., if 

z 2 

R s (z) = 1 + z + ~ + a z z ^ + • • • + a s z s (2.56) 

then it can be seen from (2.8) that any corresponding Runge-Kutta scheme is also 
of second order for nonlinear problems. Analytic expressions, in terms of an el¬ 
liptic integral, for such optimal polynomials have been obtained by Lebedev & 
Medovikov (1994). Their stability region reaches to —0.821842 -5 2 for 5 > 1. 
Their practical computation is usually done numerically (Remez 1957, Lebedev 
1995). For example, in the case 5 = 9 and for a damping factor e — 0.015, we 
obtain the roots 

z 9 = -64.64238389, = -60.67479347, z 7 = -53.21695488, 

z 6 = -43.16527010, * 5 = -31.72471699, * 4 = -20.25474163, (2.57) 

£3 = —10.05545938, z 2>1 = -1.30596166 4= M.34047517 

The corresponding stability polynomials, which are stable for —65.15 < 2 < 0, the 
stability domain, and the best ordering are shown in Fig. 2.14. We see that we now 
have a pair of complex roots. 

Lebedev’s computer code, called DUMKA, incorporates the formulas of the 
above type with automatic selection of h and 5 in a wide range. 



2nd 


.985 


.985 



Fig. 2.14. Second order Zolotarev approximation with stability domain 


Numerical Example. As an illustration, the method corresponding to (2.55) and 
(2.57) has been applied to problem (1.6’). Theory predicts stability for approxi¬ 
mately h < 65.15/135 = 0.4826. The leftmost picture of Fig. 2.15 is computed 
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Fig. 2.15. Problem (1.6*): Lebedev9, h = 0.48865 (left), DUMKA (middle), RKC (right) 

(all internal stages drawn) 

with h = 0.48865, which is a little too large and produces instability. The middle 
picture is produced by the code DUMKA with Tol = 3 • 10“ 3 . 


The Approach of van der Houwen & Sommeijer. An elegant idea for a second 
realization has been found by van der Houwen & Sommeijer (1980): apply scaled 
and shifted Chebyshev polynomials and use the three-term recusion formula (2.49’) 
for defining the internal stages. We therefore, following Bakker (1973), set 

R s( z ) = a s + b s T s( w o + w i z ) W 0 = l + e/s 2 , e«0.15. (2.58) 

The conditions for second order 


fl.(0) = i, K( 0) = 1, K(0) = 1 


lead to 


T> K) 


a s = 1 ~ b s T s( w o)> 


1 T"K)’ 3 (Tj K))*’ 

with damping a s + b s « 1 — e/3 (see Ex. 9). We now put for the internal stages 

R j( z ) = a j + W w o + W 1 Z ) i = 0,1 ,..., 3 - 1. (2.60) 

It has been discovered by Sommeijer (see Sommeijer & Verwer 1980), that these 
Rj(z) can, for j > 2, be approximations of second order at certain points x 0 + c-h 


Rj(o) = i> Rj(o) = cj, RJ(o) = c 2 


which gives 


b = 3 - - u/ 

J (r'K )) 2 


R j( z ) ~ 1=b j ( T j( w o + w i z ) - T j( w o)). 


(2.62) 
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The three-term recurrence relation (2.49’) now leads to 


R j(z) - 1 = - 1) + U j( R j - 2 (*) - 1 )+Kj-Z- (Rj-l(z) - Oj-_ l) 

where 


Mi 


2bjW 0 

b J-i ’ 


W 


2b j w 1 

w 


j = 2,3,...,s. 


(2.63) 


This formula allows, in the case of a nonlinear differential system, to define the 
scheme 


9 0 -y 0 = o, 

9i~yo = K i h f(9o)> ( 2 . 64 ) 

9j -Vo = Pj(9j -1 - Vo) + u j(9j-2 - Vo) + K j hf(g j _ 1 ) - a^Kjhfigo), 

which, being of second order for y' = Xy, is of second order for nonlinear equations 
too (again because of ( 2 . 8 )). For j — 1 only first order is possible and /c x can be 
chosen freely. Sommeijer & Verwer ( 1980 ) suggest to put 

c c 

b 0 = b 2 , b l = b 2 which gives k 1 = c 1 = ~ -f • 

1 2\ W 0J ^ 

Fig. 2.16 shows, for s = 9 as usual, the functions R s ( z ) and Rj(z ), j = 2,. .., s — 
1 together with the stability domain of R s ( z ) (the “Venus of Willendorf”) in ex¬ 
actly the same frame as Lebedev’s Zolotarev polynomial of Fig. 2.14. We see that 
the stability domain becomes a little shorter, but we have closed analytic expres¬ 
sions and a smoother behaviour of the ’s (see Fig. 2.15, right). All internal stages 
satisfy | R- (z) \ < 1 , and the method can be seen to possess a satisfactory numerical 
stability (see Verwer, Hundsdorfer & Sommeijer 1990). The above formulas have 
been implemented in a research code RKC (“Runge-Kutta-Chebyshev”) by Som¬ 
meijer (1991). As can be seen from Fig. 2.15, it performs well for equation (1.6’). 
More numerical results shall be reported in Sect. IV. 10. 



Fig. 2.16. Stability function and domain for RKC method, s = 9 , £ = 0.15 

Combined Approach of Abdulle & Medovikov. Recent research and a code 
ROCK4 are presented in: A. Abdulle, Fourth order Chebyshev methods with re¬ 
currence relation , to appear in SIAM J. Sci. Comput. 2002. 
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Exercises 

1. Prove that Runge-Kutta methods are invariant under linear transformations y = 
Tz (i.e., if one applies the method to y ' = /(re, y) and to z* — T _1 /(x, Tz) 
with initial values satisfying y 0 = Tz 0 , then we have y 1 = Tz 1 ). 

2. Consider the differential equation y* = Ay and a numerical solution given by 
y n+1 = R(hA)y n . Suppose that R(z) is A-stable, i.e., it satisfies 

\R(z)\<l for Rez<0, 

and show, by transforming A to Jordan canonical form, that 

a) if y f = Ay is stable, then { y n } is bounded; 

b) if y f = Ay is asymptotically stable, then y n —> 0 for n —> oo. 

3. (Optimal stability for hyperbolic problems, van der Houwen (1968), (1977), 
p.99): Given m, find a polynomial R m (z) = 1 + z + ... of degree m+1 
such that \R(iy)\ < 1 for —(3 <y < (3 with (3 as large as possible. 

Result. The solution (Sonneveld & van Leer 1985) is given by 

RM - ^ ro -i(C) + ^(C) + ^ m+1 (C), C = ^ (2.65) 

where V m (() = i m T rn ((/i) are the Chebyshev polynomials with positive coef¬ 
ficients. R m (iy) is stable for — m <y<m. The first R m are (see Abramowitz 
& Stegun, p. 795) 

Ri( z ) = i + C + C 2 C = — 

R-2( z ) = 1+2C + 2C 2 + 2( 3 (2.66) 

i? 3 (0) = l + 3C + 5C 2 +4C 3 + 4C 4 
R 4 {z) = 1 + 4C + 8C 2 + 12C 3 + 8C 4 + 8C 5 

Similar as for Chebyshev polynomials, they satisfy the recurrence relation 
R-m+l ~ 2(Rm + Rm-l (m > 2). Their stability domains are given in Fig. 2.17. 



Fig. 2.17. Stability domains for hyperbolic approximations 
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4. Linearize the rope equation (1.24) in the neighbourhood of 0 = 0 = 0 and make 
a stability analysis. Re-obtain Lagrange’s equation (1.6.2) from the linearized 
equation with the coordinate transformation 



/i 

\ 



( 1 

\ 


i i 




-1 1 


y = 

i i i 


o , 

6 = 

-1 1 



V: ! 

■■■) 






5. Fig. 2.18 shows the numerical results of the classical 4th order Runge-Kutta 
method with equidistant steps over 0 < t < 5 for the beam problem (1.7)-( 1.20) 
with n = 8. Explain the result with the help of Fig. 2.1. 



421 steps 425 steps 430 steps 

Fig. 2.18. Classical Runge-Kutta method (constant step sizes) on the beam problem 


6 . For the example of Exercise 5, the explicit Euler method, although converging 
for h —> 0, is never stable (see Fig. 2.19). Why? 

7. Let A be an eigenvalue of the two-dimensional left upper submatrix of C in 
(2.45) (matrix C of (2.32)) and denote its analytic continuation as eigenvalue 
of C by X(/3) . Prove that 

a) If Re A ^ 0, then for some yGi 

A(/3) = A • (l — £ (1 — Re A) + i(3y + 0(/? 2 )). 

This shows that |A(/?)| < | A| for small /? > 0 if Re A < 1 . 

b) If A and n are two distinct real eigenvalues of the above mentioned subma¬ 
trix, then 

aot=a-(i-^(i- i) 2 r^ +o ^0- 
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h = 5/20000 h = 5/28000 h = 5/36000 

Fig. 2.19. Explicit Euler on the beam problem (every 50th step drawn) 


Hint. Write the characteristic polynomial of C in the form 
det(AJ - C) = A(Ap(A) + /3q( A)), 

where p(A) = det(A I — C) is the characteristic polynomial of C, and differ¬ 
entiate with respect to (3. 

8. Show that for the Chebyshev stability functions (2.50) we have 

lim R s (z) = cos(v—2 z). 

s —y oo 

Hint. Insert arccos(l — x 2 /2) & x into (2.49) and (2.50). The corresponding 
stability domain is indicated by dotted lines in the last picture of Fig. 2.12. 

9. Show (for example with the help of (2.49’)) that for the Chebyshev polynomials 

t;(i) = s 2 , t;(i) = 

and obtain asymptotic values (for e -A 0) for w 1 , b s , a 3 , the damping factor 
and the stability interval of the Bakker polynomials (2.58). 

10. (Cross-shaped stability domains). For — 1 < (p < 1 we put 
z = —b± — 1) + b 2 , so that z moves on a cross —2b < 

z <0 and z = —b±iy. Thus (an idea of Lebedev) 

R 2 s (z)=TMz)) 

is a stability function for eigenvalues on crosses (as, e.g., for 
the PLATE problem). Determine a in dependence of b from 
the condition R'(0) — 1 and find the maximal value for y. 

Result. R 2s (z) = T s (l + z/s 2 + z 2 /(2bs 2 )); y mAK = \/4bs 2 -b 2 . 
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I didn’t like all these “strong”, “perfect”, “absolute”, “general¬ 
ized”, “super”, “hyper”, “complete” and so on in mathematical 
definitions, I wanted something neutral; and having been impres¬ 
sed by David Young’s “property A”, I chose the term “A-stable”. 

(G. Dahlquist, in 1979) 

There are at least two ways to combat stiffness. One is to design 
a better computer, the other, to design a better algorithm. 

(H. Lomax in Aiken 1985) 

Methods are called A-stable if there are no stability restrictions for y' = Xy, Re A < 
0 and h > 0. This concept was introduced by Dahlquist (1963) for linear multi- 
step methods, but also applied to Runge-Kutta processes. Ehle (1968) and Axels- 
son (1969) then independently investigated the A-stability of implicit Runge-Kutta 
methods and proposed new classes of A -stable methods. A nice paper of Wright 
(1970) studied collocation methods. 

The Stability Function 

We start with the implicit Euler method. This method, y 1 = y 0 + hf(x 1 , y x ), ap¬ 
plied to Dahlquist’s equation y f = A y becomes y 1 =y 0 + h\y 1 which, after solving 
for y x , gives 

y 1 = R(hX) y 0 with R(z) = --. 

l — z 

This time, the stability domain is the exterior of the circle with radius 1 and centre 
+1. The stability domain thus covers the entire negative half-plane and a large part 
of the positive half-plane as well. The implicit Euler method is very stable. 

Proposition 3.1. The s-stage implicit Runge-Kutta method 

s 

9i = yo + h Yl a ijf( x 0 + C A 9j) 

i=i 

s 

Vi =y 0 + h ^2 b jf( x o + c j h , 9j) 
j=1 

applied to y f — A y yields y x — R(h\)y 0 with 

R(z) = l + zb T (I-zA)- 1 l, 

where b T = (6 l5 ... ,h s ), A = (a ij ) s i j=1 , n = (1,..., 1) T . 

Remark. As in Definition 2.1, R(z) is called the stability function of Method (3.1). 

Proof. Equation (3.1a) with f(x, y) = Xy, z = h\ becomes a linear system for the 
computation of g 1 ,..., g s . Solving this and inserting into (3.1b) leads to (3.2). □ 


(3.1a) 

(3.1b) 

(3.2) 
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Another useful formula for R(z) is the following (Stetter 1973, Scherer 1979): 

Proposition 3.2. The stability function of (3.1) satisfies 

det (^-zA + z^) 

^\ z ) j„wr _ 4\ ' (3-3) 


det (I — zA) 

Proof Applying (3.1) to (2.9) yields the linear system 

= Vo 


I-zA 0 
-zb T 1 


Cramer’s rule (Cramer 1750) implies that the denominator of R(z) is det (I - 
zA ), and its numerator 


det ( ^ ) = det 

-zb 1 1 


I — zA + zllb T 0 


-zb T 


x j=det{I-zA + zlb T ). 


□ 



Fig. 3.1. Stability domains for implicit Runge-Kutta methods 


The stability functions for the methods of Sect. II.7 are presented in Table 3.1. 
The corresponding stability domains are displayed in Fig. 3.1. 

We see that for implicit methods R(z) becomes a rational function with nu¬ 
merator and denominator of degree < 3. We write 


R(z) 


P{ f) 
Q(*y 


deg P = fc, deg Q = j. 


(3.4) 
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Table 3.1. Stability functions for implicit Runge-Kutta methods of Sect. II.7 


Method 


R(z) 


a) 

b) 

c) 

d) 

e) 

f) 

g) 

h) 

i) 

j) 


0-method (II.7.2) 

implicit Euler (II.7.3) 

implicit midpoint (II.7.4) 1 
trapezoidal rule (II.7.5) J 

Hammer-Hollingsworth (II.7.6) 

SDIRK order 3 (Table II.7.2) 

Hammer-Hollingsw. 4 (Table II.7.3) 
Lobatto IIIA, order 4 (Table 

Kuntzm.-Butcher 6 (Table II.7.4) 


e II.7.3) \ 
II.7.7) J 


Butcher’s Lobatto 4 (Table II.7.6) 
Butcher’s Lobatto 6 (Table II.7.6) 
Radau IIA, order 5 (Table II.7.7) 


l + z(l — 0) 
1 -zO 
1 


l-z 
1 + z/2 
1 — z/2 

l+4z/6 + z 2 /6 
1 — z/3 

1 +z( 1 — 2y) + z 2 (l/2 — 27 + 7 2 ) 

(i - yzf 
\+z/2 + z 2 /\2 
1 — z/2 + z 2 /\2 
1+z/2 + z 2 /10 + z 3 /120 
1-z/2+z 2 /10-z 3 /120 
l + 3z/4 + z 2 /4 + z 3 /24 
l-z/4 

1+2z/3 + z 2 /5 + z 3 /30+z 4 /360 
1 — z/3 + z 2 /30 
l + 2z/5 + z 2 /20 
1 — 3z/5 + 3z 2 /20 — z 3 /60 


If the method is of order p , then 

e z - R(z) = Cz p+1 + 0(z p + 2 ) for z ^ 0 (3.5) 

(see Theorem 2 . 2 ). The constant C is usually ^ 0. If not, we increase p in (3.5) 
until C becomes 7 ^ 0. We then call R(z) a rational approximation to e z of order 
p and C its error constant. 


A-Stability 


We observe that some methods are stable on the entire left half-plane C~ . This 
is precisely the set of eigenvalues, where the exact solution of (2.9) is stable too 
(Sect. 1.13, Theorem 13.1). A desirable property for a numerical method is that it 
preserves this stability property. 

Definition 3.3 (Dahlquist 1963). A method, whose stability domain satisfies 

S D C~ = {z\ R ez < 0}, 


is called A-stable . 
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A Runge-Kutta method with (3.4) as stability function is A-stable if and only 
if 

\R(iy)\ < 1 for all real y (3.6) 

and 

R(z) is analytic for Re 2 < 0. (3.7) 

This follows from the maximum principle applied to C~ . By a slight abuse of 
language, we also call R(z) A-stable in this case (or, as many authors say, “A- 
acceptable”, Ehle 1968). 

Condition (3.6) alone means stability on the imaginary axis and may be called 
I -stability. It is equivalent to the fact that the polynomial 

E(y) = \Q{iy)\ 2 - \P(iy)\ 2 = Q(iy)Q(-iy) - P(iy)P(-iy) (3.8) 

satisfies 

E(y) > 0 for all y E R. (3.9) 


Proposition 3.4. E(y), defined by (3.8), is an even polynomial of degree < 
2 max (deg P, deg Q). If R(z) is an approximation of order p, then 

E (y) = 0{y p+1 ) for y - !>0. 


Proof. Taking absolute values in (3.5) gives 

| ei-M = 0(zP + 1) . 

\Q(z)\ 

Putting 2 = iy and using |e*^| = 1 leads to 

\Q{iy)\ — \P(.iy)\ — t P(y p+1 ). 

The result now follows from 

E (y) = (IQ(* 2 /)I + IPiwWIQiw)] - l p (*»/)l)- 


□ 


Examples 3.5. For the implicit midpoint rule, the trapezoidal rule, the Hammer 
& Hollingsworth, the Kuntzmann & Butcher and Lobatto IIIA methods (c, f, g 
of Table 3.1) we have E(y) = 0 since Q(z) = P(—z). This also follows from 
Proposition 3.4 because p = 2j . A straightforward computation shows that (3.7) is 
satisfied, hence these methods are A-stable. 

For methods d, h, i of Table 3.1 we have deg P > deg Q and the leading co¬ 
efficient of E is negative. Therefore (3.9) cannot be true for y — y oo and these 
methods are not A-stable. 

For the Radau IIA method of order 5 (case j) we obtain E(y) = y 6 /3600 and 
by inspection of the zeros of Q(z) the method is seen to be A-stable. 
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For the two-stage SDIRK method (case e) E(y) becomes 

E{y) = (7 - l/2) 2 (47 - l)y 4 . (3.10) 

Thus the method is A-stable for 7 > 1/4. The 3rd order method is A-stable for 
7 = (3 + \/3)/6, but not for 7 = (3 — \/3)/6 (see Fig. 3.1). 

The following general result explains the /-stability properties of the foregoing 
examples. 

Proposition 3.6. A rational function (3.4) of order p > 2j — 2 is I -stable if and 
only if \R{oo)\ < 1. 

Proof |/J(oo)| < 1 implies k < j . By Proposition 3.4, E(y) must be of the form 
K • y 2 i . By letting y 00 in (3.6) and (3.9), we see that |i?(oo)| < 1 is equivalent 
to K > 0. □ 


L -Stability and A(a)-Stability 

The trapezoidal rule for the numerical integration of first-order or¬ 
dinary differential equations is shown to possess, for a certain type 
of problem, an undesirable property. (A.R. Gourlay 1970) 

A-stability is not the whole answer to the problem of stiff equa¬ 
tions. (R. Alexander 1977) 

Some of the above methods seem to be optimal in the sense that the stability region 
coincides exactly with the negative half-plane. This property is not as desirable as 
it may appear, since for a rational function 

lim R(z) = lim R(z) = lim R(z). 

z—y — 00 z—yoo z=iy, y—yoo 

The latter must then be 1 in modulus, since \R(iy) | = 1 for all real y . This means 
that for z close to the real axis with a very large negative real part, \R(z)\ is, 
although < 1, very close to one. As a consequence, stiff components in (2.6) are 
damped out only very slowly. We demonstrate this with the example 

y' = —2000(t/ — cos x), y( 0) = 0, 0 < x < 1.5, (3.11) 

which is the same as (1.1), but with increased stiffness. The numerical results for 
the trapezoidal rule are compared to those of implicit Euler in Fig. 3.2. The implicit 
Euler damps out the transient phase much faster than the trapezoidal rule. It thus 
appears to be a desirable property of a method that \R(z) | be much smaller than 1 
for z —> —00. 

Definition 3.7 (Ehle 1969). A method is called L -stable if it is A-stable and if in 
addition 

lim R(z) = 0. 

z—yoo 


(3.12) 
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Fig. 3.2. Trapezoidal rule versus implicit Euler on (3.11) 


Among the methods of Table 3.1, the implicit Euler, the SDIRK method (e) 
with 7 = (2 ± y/2) /2, as well as the Radau IIA formula (j) are L -stable. 

Proposition 3.8. If an implicit Runge-Kutta method with nonsingular A satisfies 
one of the following conditions: 

a sj = b j j = (3.13) 

a n — b i i = !,■••, 5, (3.14) 

then R(oo) =0. This makes A-stable methods L -stable. 

Proof By (3.2) 

R( oo) = 1-6 t A _1 11 (3.15) 

and (3.13) means that A T e s — b where e s = (0,..., 0,1 ) T . Therefore i?(oo) = 
1 — efl = 1 - 1 = 0. In the case of (3.14) use Ae 1 = IF7 . □ 


Methods satisfying (3.13) are called stiffly accurate (Prothero & Robinson 
1974). They are important for the solution of singularly perturbed problems and 
for differential-algebraic equations (see Chapters VI and VII). 

The definition of A-stability is on the one hand too weak, as we have just seen, 
and on the other hand too strong in the sense that many methods which are not so 
bad at all are not A-stable. The following definition is a little weaker and will be 
specially useful in the chapter on multistep methods. 

Definition 3.9 (Widlund 1967). A method is 
said to be A(a) -stable if the sector 

•S'a = { z ; I arg(-^) |< a, 0} 
is contained in the stability region. 
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For example, the Pade approximation R 03 {z) = 
(3.29) below) is A(a) -stable for a < 88.23°. 


l-z + 


2 ! 


3! 


-l 

(see 


Numerical Results 

To show the effects of good stability properties on the stiff examples of Sect. IV. 1, 
we choose the 3-stage Radau IIA formula (Table 5.6 of Sect.IV.5) which, as we 
have seen, is A-stable, L-stable and of reasonably high order. It has been coded 
(Subroutine RADAU5 of the Appendix) and the details of this program will be dis¬ 
cussed later (Sect. IV.8). This program integrates all the examples of Sect. IV. 1 in a 
couple of steps and the plots of Fig. 1.3 and Fig. 1.5 show a clear difference. 

The beam equation (1.10’) with n = 40 is integrated, with Rtol = Atol = 10 -3 
(absolute) and smooth initial values, in 28 steps (Fig. 3.3). 



Fig. 3.3. RADAU5 on the beam 
(1.10’), every step drawn 



Fig. 3.4. RADAU5 on oscillatory beam 
with large Tol (107 steps, all drawn) 


Since the Radau5 formula is L -stable, the stability domain also covers the 
imaginary axis and large parts of the right half-plane C+ . This means that high 
oscillations of the true solution may be damped by the numerical method. This 
effect, sometimes judged undesirable (B. Lindberg (1974): “dangerous property 
... ”), may also be welcome to suppress uninteresting oscillations. This is demon¬ 
strated by applying RADAU5 with very large tolerance (Rtol = Atol = 1) to the 
beam equation (1.10’) with n — 10 and the perturbed initial value 0 n (O) = 0.4. 
Here, the high oscillations soon disappear and the numerical solution becomes per¬ 
fectly smooth (Fig. 3.4). If, however, the tolerance requirement is increased, the 
program is forced to follow all the oscillations and the picture remains the same as 
in Fig. 1.11. 
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Stability Functions of Order > s 


Consider rational functions R(z) = P(z)/Q(z ), where Q(0) = 1, and both P(z) 
and Q(z) are polynomials of degree at most s. If R{z) is an approximation of e z 
of order > s , then is follows from (3.5) that 

e z Q{z) = P(z) + C x 2 S+1 + C 2 z s+2 + ... . (3.16) 


Consequently, the polynomial P(z) and also the error constants C l5 C 2 ,... are 
uniquely determined in terms of the coefficients of Q(z). For 

Q( z ) — % + <h z + Q2 z2 + • • • + % = ^ (3.17) 


an expansion of e z Q(z ) into powers of z yields 

P(z)=q 0 + z\ < -+ y)+z y— + — + 


+ ... + ** 


'% , gl 
,a! (5-1)! 


1! ' Ol) 

+ +^) 

0 ! / ’ 


and for the error constants 


c 1 = 
c 2 = 


% ■ gl ■ , ga-1 , <ls 

(s + l)\ s\ 2! 1! 

% . <h . . g*-i . <ls 

(a + 2)! (a + 1)! 3! 2!' 


(3.18) 


(3.19) 

(3.20) 


The Polynomial M(x). With help of the polynomial 

M(x) =q s + q s _i yy + <? s _ 2 Tjf + • • • + <7o^j" (3-21) 

the formulas for Q(z) and P(z) become more symmetric. We have 

Q(z) = M (s) (0) + M (s_1) (0)* + ... + M(0)z s (3.22) 

P(z) = M {s) (l) + M {s - 1) (l)z + ... + M(l)z s , (3.23) 


and the error constants are given by 
C 


i = f M(x)dx, C 2 = f (1 — x)M(x)dx. 

Jo Jo 


(3.24) 


For the stability function of collocation methods we have the following nice result. 


Theorem 3.10 (K. Wright 1970, S.P. Nprsett 1975). The stability function of 
the collocation method based on the points c 1 ,c 2 ,...,c s is given by R(z) = 
P(z)/Q(z), where Q{z) and P(z) are the polynomials of (3.22) and (3.23), re¬ 
spectively, with M(x) given by 

M M = jilF®- 0 ;) • 

z=l 


(3.25) 
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Proof (N0rsett & Wanner 1979). We assume z 0 = 0, h — 1, X = z, y Q = 1 and 
let u(x) be the collocation polynomial. Since u f (x) — zu(x) is a polynomial of 
degree 6 which vanishes at the collocation points, there are constants K 0 and K 
such that 

u\x) — zu{x) = K 0 M(x) or ^1- ^u(x) = KM(x) (3.26) 

with the polynomial M(x) of (3.25) (D denotes the differentiation operator). Ex¬ 
panding (1 — D/z)- 1 into a geometric series yields 

u{x) = K[ 1 + - + — + + —) M(x), (3.27) 

because M(-?)(z) =0 for j > s. From u( 1) = R(z)u( 0) we have the relation 
R(z) = u(l)/u( 0), which leads to (3.22) and (3.23). □ 


Fade Approximations to the Exponential Function 

Comme cela est souvent le cas en ce qui conceme les decouvertes 
scientifiques, leur inventeur n’est pas H. Pade. 

(C. Brezinski 1984, (Euvres de H. Pade, p. 5) 


Pade approximations (Pade 1892) are rational functions which, for a given degree 
of the numerator and the denominator, have highest order of approximation. Their 
origin lies in the theory of continued fractions and they played a fundamental role 
in Hermite’s (1873) proof of the transcendency of e . 

These optimal approximations can be obtained for the exponential function e z 
from (3.22) and (3.23) by the following idea (Pade 1899): choose M(x) such that 
in (3.22) and (3.23) as many terms as possible involving high powers of z become 


zero, i.e., 


M{x) 


x k (x — iy 
{k+j)\ 


(3.28) 


then M(*)(0) = 0 for i — 0,..., k — 1 and MW(1) = 0 for i = 0,..., j — 1. 


Theorem 3.11. The (/c, j) -Pade approximation to e z is given by 

P kj( z ) 


p ki( z ) 


Qkj ( z ) 


where 


t-» / x i & h(k — 1 ) 

kj( z) ~ l + JTk z+ (j + k)ti + k- 1)'2! 


z l k(k — 1)... 1 z 

+ .. .+ 


<?*,-(*)= i 


r Z+ ■ 


3(3- 1) 


* k + j ~ ' (k + j)(k + j- 1) 2! 

= Pjk(~ z )> 




(j + k)... O + l) k\ 
i - 1 ) • • • 1 


(3.29) 


(fc+j).. .(fc + 1) jl 
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with error 

n\b\ 

eZ - w = (- i Y {J+ w + k + ir Hk+1 + <^ + * +2 )- (3 - 30 ^ 

It is the unique rational approximation to e z of order j + 'k, such that the degrees 
of numerator and denominator are k and j, respectively. 


Table 3.2. Pade approximations for e z 


1 

1 +z 

1 + £ + ^ 

1 

1 

1 

1 

1 + \z 

1 + l 2 +5^ 

l-z 

l-\ z 

i-F 

1 

1+J* 


!-' + ■£ 

1 _ 2 . i 

1 3 z + 3 2! 

1 - -2+ - — 

1 6 2! 

1 

l + iz 

i _i. L j_L 

1 + 10 2! 

1 Z 2 Z 3 

1 ~ z + JT ~ 3T 

1 3 y , lz 2 lz 3 
i T 2 2! 4 3! 

i 3 „ , 3 z 2 1 z 3 

1 5 2-1 " 10 2! 10 3! 


Proof Inserting (3.28) into (3.22) and (3.23) gives the formulas for P k j{z), Q k j (z) 
and (3.30). The uniqueness is a consequence of the fact that the (j + k) -degree 
polynomial M{x) of (3.21) must have a zero of multiplicity k at x — 0, and one 
of multiplicity j at x = 1. □ 


Table 3.2 shows the first Pade approximations to e z . We observe that the stabil¬ 
ity function of many methods of Table 3.1 are Pade approximations. The diagonal 
Pade approximations are those with k = j . 


Exercises 

1. Let R(z) be the stability function of (3.1) and R*(z) the stability function of 
its adjoint method (see Sect. II.8). Prove that 

R*(z) = (R(-z))' 1 . 


2. Consider an implicit Runge-Kutta method of order p > s with nonsingular A , 
distinct c • and non-zero 6 •. Show 

a) If C(s) and c s = 1 then (3.13); 

b) If D(s) and Cl = 0 then (3.14). 
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In both cases the stability function satisfies iJ(oo) = 0. 

(For the definition of the assumptions C(s) and D(s) see Sect. IV.5). 

3. Show that collocation methods can only be L-stable if M( 1) = 0, i.e., if one 
of the c ’s, usually c s , equals 1. 

4. (Pade (1899), see also Lagrange (1776)). Show that the continued fraction 


1 + 




j_ x± 
1-3 4 


1 x* 


1 + - 


1 + 


1 x* 
5-7 4 


1 + 


7-9 4 

1 + .. 


leads to the diagonal Pade approximations for e x . 

Hint. Compute the first partial fractions. If you don’t succeed in finding a 
general proof, read Sect. IV.5. 


5. The trapezoidal rule 


0 

0 

0 

1 

1/2 

1/2 


1/2 

1/2 


satisfies a si = &•, but not iZ(oo) = 0. Why doesn’t this contradict Proposi¬ 
tion 3.8? 


6. Show that 

Vi =y 0 + hf(y 0 +8(y 1 -y 0 )) 

2/1 =yo + h ( 1 -°)f(yo) + h Qf(yi) 

are both nonlinear extensions of the ^-method. Find others. 

7. The composition of a step of the 0-method with step-size ah, followed by a 
0'-method with step-size (1 — 2 a)h and again a 0 -method with step-size ah 
leads to 

_ / l + q<l-g) y ( l J r{\ — 2a)z(l — 6') \ 

V 1 — azO ) V 1 —(1 —2 a)zO l ) 

Show that this method, for O’ — 1 — 0, is of order 2 if a = 1 — y/2/2 and 
strongly A-stable (i.e., A-stable and |iJ(oo)| < 1) for 0 > 1/2. The au¬ 
thors Muller, Prohl, Rannacher & Turek (1994) call this method “fractional 0- 
method” and use it successfully for computations of the incompressible Navier- 
Stokes equations. 
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Mein hochgeehrter Lehrer, der vor wenigen Jahren verstorbene 
Geheime Hofrath Gauss in Gottingen, pflegte in vertraulichem 
Gesprache haufig zu aussem, die Mathematik sei weit mehr eine 
Wissenschaft fur das Auge als eine fur das Ohr. Was das Auge 
mit einem Blicke sogleich ubersieht ... 

(J.F. Encke 1861, publ. in Kronecker’s Werke, Vol. 5, page 391) 


Order stars, discovered by searching for a better understanding of the stability 
properties of the Pade approximations to e z (Wanner, Hairer & Nprsett 1978), 
offered nice and unexpected access to many other results: the “second barrier” of 
Dahlquist, the Daniel & Moore conjecture, highest possible order with real poles, 
comparison of stability domains (Jeltsch & Nevanlinna 1981, 1982), order bounds 
for hyperbolic or parabolic difference schemes (e.g., Iserles & Strang 1983, Iserles 
& Williamson 1983, Jeltsch 1988). 


Introduction 


When I wrote my book in 19711 wanted to draw “relative stability 
domains”, but curious stars came out from the plotter. I thought 
of an error in the program and I threw them away ... 

(C.W. Gear, in 1979) 

We present in Fig. 4. 1 the stability domains for the Pade approximations R 33 , R 24 , 
R 15 , Rqq of Theorem 3.12, which are all 6th order approximations to exp (z) . It 
can be observed that R 33 and R 24 are nicely A-stable. The other two are not, R 15 
violates (3.6) and R 06 violates (3.7). After some meditation on these and similar 
figures, trying to obtain a better understanding of these phenomena, one is finally 
led to 

Definition 4.1. The set 

A = {2 G C ; |.R(z)| > |e z |) = jz € C ; \q{z)\ > l) (4.1) 

where q(z) = R(z)/e z , is called the order star of R. 

The order star does not compare \R(z)\ to 1, as does the stability domain, but 
to the exact solution \e z \ = e x and it is hoped that this might give more informa¬ 
tion. As we always assume that the coefficients of R(z) are real, the order star 
is symmetric with respect to the real axis. Furthermore, since \e { y\ = 1, A is the 
complementary set of the stability domain S on the imaginary axis. Therefore we 
have from (3.6) and (3.7): 
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Fig. 4.1. Stability domains for Pade approximations 


Lemma 4.2. R(z) is I -stable if and only if 

(i) A n z’M = 0 . 

Further, R(z) is A -stable if and only if(i) and 

(ii) all poles ofR(z) (= poles ofq(z)) lie in the positive half plane C + . □ 


Fig. 4.2 shows the order stars corresponding to the functions of Fig. 4.1. These 
order stars show a nice and regular behaviour: there are j black “fingers” to the 
right, each containing a pole of R k •, and k white “fingers” to the left, each contain¬ 
ing a zero. Exactly two boundary curves of A tend to infinity near to the imaginary 
axis. These properties are a consequence of the following three Lemmas. 

Lemma 4.3. If R(z) is an approximation to e z of order p, i.e., if 

e z - R(z) = Cz p+1 + <D{z p + 2 ) (4.2) 
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Fig. 4.2. Order stars for Pade approximations 

with C 7 ^ 0 , then, for z — 0 , A behaves like a “star” with p+1 sectors of equal 
width 7r/(p+ 1), separated by p -\-1 similar “white” sectors of the complementary 
set. The positive real axis is inside a black sector iff C <0 and inside a white sector 
iff C > 0. 

Proof. Dividing the error formula (4.2) by e z gives 

= 1 - Cz p+1 + 0(z p+2 ). 

Thus the value R(z)/e z surrounds the point 1 as often as zp +1 surrounds the 
origin, namely p+1 times. So, R(z)/e z is p+1 times alternatively inside or 
outside the unit circle. It lies inside for small positive real z whenever C > 0. □ 
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Lemma 4.4. If z = re l ° and r -» oo, then z G A for 7r/2 < 0 < 37r/2 and z ^ A 
for —7r/2<0<7r/2. The border dA possesses only two branches which go to 
infinity. If 

R(z)^Kz e + 0(z e ~ 1 ) for 0 -^ 00 , (4.3) 

these branches asymptotically approach 

x = log \K\+£ log \y\ (4.4) 


Proof. The first assertion is the well-known fact that the exponential function, for 
Re 2 —> ±00 is much stronger than any polynomial or rational function. In order to 
show the uniqueness of the border lines, we consider for r -» oo the two functions 


= 


|2 _ ^2r cos 9 


V 2 (6) = \R(z)\ 2 =R(re i(, )R(re-' ff ). 


Differentiation gives 


— = —2r sin 0, 
Ti 


— = 2rRe 
V2 


ie 


io 


R'(re l °) \ 
R(re ie ) ) 


Since \R'/R\ -» 0 for r oo, we have 


(4.5) 


Hence in this interval there can only be one value of 0 with ^i(^) = • F° r " 

mula (4.4) is obtained from (4.3) by 


\K\(x 2 +y 2 Y /2 « e®, log \K\ + - log(a; 2 + y 2 ) w x 

and by neglecting x 2 , which is justified because x/y -» 0 whenever x + z?/ tends 
to infinity on the border of A. □ 


It is clear from the maximum principle that each bounded “finger” of A in 
Fig. 4.2 must contain a pole of q(z). A still stronger result is the following: 

Lemma 4.5. Each bounded subset F C A with common boundary dF C dA 
collecting m sectors at the origin must contain at least m poles of q{z) (each 
counted according to its multiplicity). Analogously, each bounded “ white” subset 
F C C \ A with m sectors at the origin must contain at least m zeros of q(z). 

Proof. Suppose first that dF is represented by a parametrized positively ori¬ 
ented loop c(f), t Q <t < t x . Let a = (^(f), c f 2 (t)) be the tangent vector and 
n = ( c f 2 (t ), —c[ (t)) an exterior normal vector. We write 

q(z) =r(x,y) • e ttp ( x ' y \ z = x + iy 
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so that log q(z) = log r(x, y ) + itp(x, y ). Since the modulus increases inside F, 
we have 


dQ°gr) <Q 

dn ~ 

Now the Cauchy-Riemann differential equations for log q are 


(4.6) 


<9(log r) _ dtp <9(log r ) _ dtp 

dx dy ’ dy dx' 


(4.7) 


so that (4.6) becomes 



(4.8) 


This inequality is strict except at a finite number of points, because q f (c(t)) • dipt) = 
i • q{c(t)) • dtp/da and the number of zeros of q'(z) is finite. Thus the argument 
of q decreases along c. If the contour curve c(t) returns m times to the origin, 
where the argument is a multiple of 2tt , the vector q(z) must perform at least m 
complete revolutions in the negative sense (Fig. 4.3). Thus the argument principle 
(an idea which we have already encountered in Sect. 1.13; see Volume I, pages 81 
and 382), ensures the presence of at least m poles inside F (there are no zeros, 
because these are not in A). 

If the boundary curve is represented by several curves, all rotation numbers are 
added up. For “white” subsets the proof is similar, just that d(log r)/dn > 0 and 
the argument rotates in the other sense. □ 



Fig. 4.3. SDIRK methods, order 3; arrows indicate direction of q(z) 


Fig. 4.3 gives an illustration of two order stars for the SDIRK methods of order 
3 (Table 3.1, case e). Here, q(z) possesses a double pole at z = 1 / 7 . However, 
for 7 = (3 — \/3)/6, the bounded component F of A collects only one sector at 
the origin. Since the vector q(z) performs two rotations, there is in addition to 
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the origin a second point on dF for which ar g(q) = 0, i.e., ar g(R(z)) = arg(e^). 
Thus, because \R{z)\ = \e z \ on dA , we have R(z) = e z . These points are called 
exponential fitting points. Another version of Lemma 4.5 is thus (Iserles 1981): 

Lemma 4.5’. Each bounded subset F C A with dF C dA contains exactly as 
many poles as there are exponential fitting points on its boundary. □ 


Order and Stability for Rational Approximations 


In the sequel we suppose R(z) to be an arbitrary rational approximation of order 
p with k zeros and j poles. 

Theorem 4.6. If R(z) is A-stable, then p < 2k 1 + 2, where k 1 is the number of 
different zeros of R(z) in C~ . 

Proof At least [(p+ l)/2] sectors of A start in C - (Lemma 4.3). By A-stability 
these have to be infinite and enclose at least [(p + l)/2] — 1 bounded white fingers, 
each containing at least one zero by Lemma 4.5. Therefore [(p+ l)/2] — l<k 1 . 

□ 


Theorem 4.7. If R(z) is I-stable, then p < 2 , where j x is the number of poles 
of R(z) in C+ . 

Proof At least [(p+ l)/2] sectors of A start in C+ . They cannot cross iR and 
must therefore be bounded (Lemma 4.4). Again by Lemma 4.5 we have [(p + 
1)/2]<J X . □ 


Theorem 4.8. Suppose that p > 2j — 1 and |i^(oo)| < 1 . Then, R(z) is A-stable. 

Proof By Proposition 3.6 the function R(z) is 7-stable. Applying Theorem 4.7 
we get j 1 > j so that /-stability implies A-stability. □ 


Theorem 4.9 (Crouzeix & Ruamps 1977). Suppose p>2j — 2, |/£(oo)| < 1, and 
the coefficients of the denominator Q(z) have alternating signs. Then, R(z) is 
A-stable. 


Proof. A similar argument as in the foregoing proof allows at most one pole in C~ . 
It would then be real and its existence would contradict the hypothesis on signs of 
Q(z). □ 
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Theorem 4.10. Suppose p > 2j — 3, R(z) is I -stable, and the coefficients of Q(z) 
have alternating signs. Then, R(z ) is A-stable. 

Proof. For p > 2 j — 3 the argument of the foregoing proof is still valid. However 
Proposition 3.6 is no longer applicable and we need the hypothesis on /-stability. 

□ 


We see from Fig. 4.2 that all poles and all zeros for Pade approximations must 
be simple. Whenever two poles coalesce, the corresponding sectors create a boun¬ 
ded white finger between them with the need for an additional zero. Thus the 
presence of multiple zeros or poles will require an order reduction. 

Theorem 4.11. Let R(z) possess k 0 distinct zeros and j 0 distinct poles. Then, 
P<k 0 +J o- 



Fig. 4.4. Order star on Gaussian sphere 


Proof. We identify the complex plane with the Gaussian sphere and the order star 
with a CW-complex decomposition of this sphere (Fig. 4.4). Let s 2 be the number 
of 2-cells /•, Sj the number of 1-cells l • (paths), and s 0 the number of vertices. 
Then Euler’s polyhedral formula (“Si enim numerus angulorum solidorum fuerit 
= S , numerus acierum = A et numerus hedrarum = H , semper habetur S + H = 
A + 2, hincque vel 5 = A + 2 — H vel H = A + 2 — S vel A = S + H — 2, quae 
relationis simplicitas ob demonstrationis difficultatem ...”, Euler (1752)), implies 

5 o -5 i + 5 2 = (4.9) 

Modern versions are in any book on algebraic topology, for particularly easy read¬ 
ing see e.g. Massey (1980, p. 87, Corollary 4.4). Formula (4.9) is only true if all /• 
are homeomorphic to disks. Otherwise, they have to be cut into disks by additional 
paths (dotted in Fig. 4.4). So, in general, we have 


5 0 5 1 5 2 — 


(4.9’) 
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Since each vertex is reached by at least 2 paths, the origin by hypothesis by 2p + 2, 
and since every path has two extremities, we have 

5!-5 0 >P- (4.10) 

By Lemma 4.5 each 2-cell, with the exception of two (the two “infinite” ones) must 
contain at least a pole or a zero, so we have 

s 2 < k Q + j 0 + 2. (4.11) 

These three inequalities give p < k 0 + j 0 . □ 


Stability of Fade Approximations 

... evidence is given to suggest that these are the only L-accept- 
able Pade approximations to the exponential. 

(B.L. Ehle 1973) 

Theorem 4.12. A Pade approximation R k -(z), given in (3.30), is A-stable if and 
only if k < j < k + 2. All zeros and all poles are simple. 

Proof. The “if”-part is a consequence of Theorem 4.9. The “only if’-part follows 
from Theorem 4.6 since p = k + j. For the same reason Theorem 4.11 shows that 
all poles and zeros are simple. □ 


Comparing Stability Domains 


Da ist der allerarmste Mann 
dem ander’n viel zu reich, 
das Schicksal setzt den Hobel an 
und hobelt beide gleich. 

(F. Raimund, das Hobellied) 

Jeltsch & Nevanlinna (1978) proved the following “disk theorem”: If S is the 
stability domain of an s -stage explicit Runge-Kutta method and D the disk with 
centre —s and radius s (i.e the stability domain of 5 explicit Euler steps with step 
size h/s), then 

Sf)D (4.12) 

unless S = D and the method in question is Euler's method. This curious result 
expresses the fact that Euler’s method is “the most stable” of all methods with 
equal numerical work. After the discovery of order stars it became clear that the 
result is much more general and that any method has the same property (Jeltsch & 
Nevanlinna 1981). We shall also see in Chapter V that this result generalizes to 
many multistep methods. The main tool of this theory is 
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Definition 4.13. Let R x (z) and R 2 (z) be rational approximations to e z , then their 
relative order star is defined as 


B = 


{ 


z£ C; 


RM 

R*{z) 



(4.13) 


Here, the stability function for method 1 is compared to the stability function 
for method 2 instead of to the exact solution e z . The following order relations 

e z - R^z) = C lZ Pl+1 + ... 

e z — R 2 (z) = C 2 z P2+1 +... 

lead, by subtraction, to 

444 = 1 - czP+i +■■■ ( 4 - i4 > 

R 2 (z) 

where p = min(p 1 ,p 2 ) and 

(C 1 -C 2 if p ± - p 2 

C=< C x if P X <P 2 (4.15) 

l -C 2 if Pi>P 2 - 


Remark 4.14. The statement of Lemma 4.3 remains unchanged for B, whenever 
C 4 0. Since the fraction R 1 (z)/R 2 (z) has no essential singularity at infinity, 
there is no analogue of Lemma 4.4. Further, the boundedness assumption on F 
can be omitted in Lemmas 4.5 and 4.5’ (if oo is a pole of R 1 (z)/R 2 (z) , it has to 
be counted also). With the correspondences displayed in Table 4.1, the statements 
of Theorems 4.6 and 4.7 remain true for B . 

Table 4.1. Correspondences between A and B 


order star A (4.1) 
imaginary axis 

C" 


C+ 


method A-stable 


P 


relative order star B (4.13) 
dS 2 

interior of S 2 
exterior of S 2 
SiDS 2 
min(pi,p 2 ) 


Theorem 4.15. If R ± (z) and R 2 (z) are polynomial stability functions of degree s 
and orders > 1, then the corresponding stability domains satisfy 

S 1 2 $ S 2 and S 1 (jt S 2 . (4.16) 

Proof Suppose that S ± D S 2 (i.e., by Table 4.1, suppose 66 A -stability”). Then the 
analogue of Theorem 4.7 requires that R 1 (z)/R 2 ( z ) have a pole outside S 2 . Since 
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Ri(z) and R 2 (z) have the same degree, R 1 (z)/R 2 (z) has no pole at infinity. 
Therefore the only poles of R 1 (z)/R 2 (z) are the zeros of R 2 (z) and these are 
inside S 2 . This is a contradiction and proves the first part of (4.16). The second 
part is obtained by exchanging R x (z) and R 2 (z). □ 


In order to compare numerical methods with different numerical work, we con¬ 
sider scaled stability domains. 

Definition 4.16. Let R(z) be the stability function of degree 5 of an explicit 
Runge-Kutta method (usually with 5 stages), then 

S scal = {* ; \R(sz)\ < l} = [z ; 5 • * e s} = jS (4.17) 

will be called the scaled stability domain of the method. 



Fig. 4.5. Scaled stability domains for Taylor methods (2.12) 


Theorem 4.17 (Jeltsch & Nevanlinna 1981). If Ri(z) and R 2 (z) are the stability 
functions of degrees s x resp. s 2 of two explicit Runge-Kutta methods of orders 
> 1, then 

S{ cal $ S 3 2 cal and S{ cal <jt S° 2 cal , (4.18) 

i.e., a scaled stability domain can never completely contain another. 

The interesting interpretation of this result is that for any two methods, there 
exists a differential equation y f = A y such that one of them performs better than 
the other. No “miracle” method is possible. 

Proof We compare s 2 steps of method 1 with step size h/s 2 to s 1 steps of method 
2 with step size h/s 1 . Both procedures then have comparable numerical work for 
the same advance in step size. Applied to y f = A y , this compares 
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of the same degree. Theorem 4.15 now gives 

s 2 • Sj 75 • S 2 or S° cal -j) S 2 cal . □ 


As an illustration to this theorem, we present in Fig. 4.5 the scaled stability 
domains for the Taylor methods of orders 1, 2, 3, 4 (compare with Fig. 2.1). It can 
clearly be observed that none of them contains another. 


Rational Approximations with Real Poles 

The surprising result is that the maximum reachable order is m +1. 

(Nprsett & Wolfbrandt 1977) 


The stability functions of diagonally implicit Runge-Kutta methods (DIRK meth¬ 
ods), i.e., methods with a { - = 0 for i < j , are 


R(z) 


_ P(z) 

~ h Z ){ 1 ~ l2 Z ) ■ ■ ■ ~ 1 s Z Y 


(4.19) 


where 7 • = a • • (i = 1,..., s) and degree P<s. This follows at once from Formula 
(3.3) of Proposition 3.2, since the determinant of a triangular matrix is the product 
of its diagonal elements. Thus R(z) possesses real poles 1 / 7 ^ l/ 7 2 ,..., 1 / 7 S . 
Such approximations to e z will also appear in the next sections as stability func¬ 
tions of Rosenbrock methods and so-called singly-implicit Runge-Kutta methods. 
They thus merit a more thorough study. Research on these real-pole approxima¬ 
tions was started by Nprsett (1974) and Wolfbrandt (1977). Many results are col¬ 
lected in their joint paper Nprsett & Wolfbrandt (1977). 

If the method is of order at least 5 , P(z) is given by (3.18). We shall here, and 
in the sequel, very often write the formulas for s = 3 without always mentioning 
how trivial their extension to arbitrary s is. Hence for s = 3 


R(z) 

where 


1 + z 



+«■(§- 14 - 1 ) 

1 — zS 1 + z 2 S 2 — z 3 S 3 

(4.20) 


S 0 = S 1 =71 +7 2 + 7 3 > • S '2 = 7 i 72+7 i 73 + 7273> 

The error constant is for p — s 

So_S, S^S, 

4! 3! 2! 1! ' 


5, 3 = 7i7 2 73- 


(4.21) 


Theorem 4.18. Let R(z) be an approximation to e z of order p with real poles 
only and let k be the degree of its numerator Then, 

p < k + 1. 
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Proof. If a sector of the order star A ends up with a pole on the real axis, then by 
symmetry the complex conjugate sector must join the first one. All white sectors 
enclosed by these two must therefore be finite (Fig. 4.6.). The same is true for 
sectors joining the infinite part of A. There is thus on each side of the real axis 
at most one white sector which can be infinite. Thus the remaining p— 1 white 
sectors require together at least p— 1 zeros by Lemma 4.5, i.e., we have p — 1 < k . 

□ 



Fig. 4.6. An approximation with real poles, 3 zeros, order 4 


Remark 4.19. If p > k , then at least one white sector must be 
unbounded. This is then either the first sector on the positive 
real axis, or, by symmetry, there is a pair of two sectors. By 
the proof of Theorem 4.18 the pair is unique and we shall call 
it Cary Grant’s part. 



Remark 4.20. If p = k + 1, the optimal case, there are k + 2 white sectors, two of 
them are infinite. Hence each of the remaining k sectors must then contain exactly 
one root of P(z). As a consequence, C < 0 iff P(z) has no positive real root 
between the origin and the first pole. 


The Real-Pole Sandwich 

We now analyze the approximations (4.19) with order p > s in more detail (Nprsett 
& Wanner 1979). We are interested in two sets: 

Definition 4.21. Let L be the set of ( 7 i, • • •, 7 S ) for which degP(^) in (4.20) is 
< s — 1, i.e., R( oo) = 0 for 7 - ^ 0 (i = 1,..., s). 

Definition 4.22. Denote by H the set of ( 71 , • • •, 7 S ) for which the error constant 
(4.21) is zero, i.e., for which the approximation has highest possible order p = 
6 + 1 . 
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A consequence of Theorem 4.18 is 

LC\H = i 


(4.22) 


Written for the case s = 3 (generalizations to arbitrary 6 are straightforward) and 
using (4.20) and (4.21) the sets L and H become 


T {, v 1 71 +72 + 73 7172 + 7173 + 7273 

L= U7l, 72,73) ; ^7 -^-+ ' 


3! 


H 


= {(71,72,73); ^ - 


2! 

7l + 72 + 73 
3! 


1! 

7172 + 7173 + 7273 
2! 


7l 72 73 
0! 

7l 72 73 
1! 


= °} 
=4 


(4.23) 


Theorem 4.23 (N0rsett & Wanner 1979). The surfaces H and L are each com¬ 
posed of s disjoint connected sheets 

L — U L/2 U ... U L 3 , H = H ^ U H 2 U ... U H s . (4.24) 

If a direction 8 = (^,..., £ 5 ) is chosen with all 8 i ^ 0 and if k of them are positive , 
then the ray 

x = 0 << < 00 ) (4.25) 

intersects the sheets H 1 , L 1 , H 2 , L 2 ,..., H k , L k in this order and no others. 



Proof When the 8 i have been chosen, inserting 7 - = t8 i into (4.23) gives 

_ ± ^1 ^2 ^3 1 ±2 ^1 ^2 ^3 ^2^3 _ »3 ^1 ^2 ^3 _ n 

3! 2! + 1! 0! 

2 _ 4. ^1 ^2 ^3 . j .2 ^1 ^2 ^ ^3 + 8 2 83 _ 3 ^1 8 2 8 S _ 

4! 3! 2! 1! 


(4.26) 


for L and H , respectively. These are third (in general 6 th) degree polynomials 
whose positive roots we have to study. We vary the <Ts, and hence the ray X, 
starting with all 8 ’s negative. The polynomials (4.26) then have all coefficients 
positive and obviously no positive real roots. When now one delta, say 8 S , changes 



64 IV. Stiff Problems — One-Step Methods 




Fig. 4.8. The sandwich for s = 3 ... and for s = 5 


sign, the leading coefficients of (4.26) become zero and one root becomes infinite 
for each equation and satisfies asymptotically 


1! 

^1 ^2 


2! 


A S 2 S 3 
0! 
A 

1 ! 


« 0 


« 0 



(4.27) 


for L and H , respectively. Thus H comes below and L comes above. Because 
of L n H = 0 (4.22) these two roots can never cross and must therefore remain in 
this configuration (see Fig. 4.7). 

When then successively S 2 and S 1 change sign, the same scene repeats itself 
again and again, always two sheets of H and L descend from above in that order 
and are layed on the lower sheets like slices of bread and ham of a giant sandwich. 
Because L fi H = 0, these sheets can never cross, two roots for L or H can never 
come together and become complex. So all roots must remain real and the theorem 
must be true. 

A three-dimensional view of these surfaces is given in Fig. 4.8. □ 


The following theorem describes the form of the corresponding order star in 
all these sheets. 
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Theorem 4.24. Let G 1 ,..., G s be the open connected components of R s \ H such 
that L i lies in G z , and let G 0 be the component containing the origin. Then the 
order star of R(z) given by (4.20) possesses exactly k bounded fingers to the right 
of Cary Grant’s part if and only if 

eG k \JH k . 


Proof. We prove this by a continuity argument letting the point ^,..., 7 S ) travel 
through the sandwich. Since Cary Grant’s part is always present (Remark 4.19), 
the number of bounded sectors can change only where the error constant C (4.21) 
changes sign, i.e., on the surfaces H 1 , H 2 ,..., H s . Fig. 4.9 gives some snap-shots 
from this voyage for s = 3 and — 7 2 = 7 3 = 7 . In this case the equations (4.23) 
become 


1 

3! 

_ 1 _ 

4! 


37 37 2 7 3 

2f + lT - oT 


37 3 7 2 7 3 

3! 2! 1 ! 


whose roots 


X 1 = 0.158984, 

X! =0.128886, 


A 2 = 0.435867, 
X 2 = 0.302535, 


A 3 = 2.40515 
X 3 = 1.06858 


(4.28) 


(4.29) 


do interlace nicely as required by Theorem 4.23. The affirmation of Theorem 4.24 
for s = 3 can be clearly observed in Fig. 4.9. 

For the proof of the general statement we also put 7 2 = ... = 7 S = 7 and 
investigate the two extreme cases: 

1. 7 = 0: Here R(z) is the Taylor polynomial 1 +z + ... + z s /s\ whose order 
star has no bounded sector at all. 

2. 7 ->> 00 : The numerator of R(z) in (4.20) becomes for s = 3 


(4.30) 

If we let 7 —)■ 00 , this becomes with zj = w 

1-10(3+ £>(-)) +w 2 (3+ £>(-)) — w 3 (l +£>(-))■ 

Therefore all roots u; • —1, hence z • -4 I /7 (see the last picture of Fig. 4.9). There¬ 
fore no zero of R(z) can remain left of Cary Grant’s part and we have s bounded 
fingers. 

Since between these extreme cases, there are at most s crossings of the surface 
H , Theorem 4.24 must be true. □ 
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1.06355 




Fig. 4.9. Order stars for 7 travelling through the sandwich 
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Theorem 4.25. The function R(z) defined by (4.20) can be I-stable only if 


(7i,---,7 s ) e u G 3 U ff 3+1 ifs = 2q-l 
(7i> • • • >7*) € U H q+1 U G q+1 if s = 2q. 


Proof The reason for this result is similar to Theorem 4.12. For /-stability the 
imaginary axis cannot intersect the order star and must therefore reach the origin 
through Cary Grant’s part. Thus /-stability (and hence A-stability) is only possible 
(roughly) in the middle of the sandwich. Since at most [(p + 2)/2] and at least 
[(p + l)/ 2 ] of the p + 1 sectors of A start in C + , the number k of bounded fingers 
satisfies 


P + 2 
2 


> k 


and 


p + 1 
. 2 


< k. 


Inserting p = s + 1 on H and p = 6 on G we get the above results. □ 


Multiple Real-Pole Approximations 

... the next main result is obtained, saying that the least value 
of C is obtained when all the zeros of the denominator are equal 

(Ndrsett & Wolfbrandt 1977) 


Approximations for which all poles are equal, i.e., for which 7 i = 7 2 =... = 7 5 = 7 
are called “multiple” real-pole approximations (N 0 rsett 1974). We again consider 
only approximations for which the order is > s. These satisfy, for 6 = 3, 


R(z) 


P{*) 

(1 — 7 z ) 3 


where P(z) is given by (4.30), and their error constant is 

r= 1 1 3 7 2 

4! 3! 2! 1 !' 


(4.31) 


(4.32) 


Approximations with multiple poles have many computational advantages (the lin¬ 
ear systems to be solved in Rosenbrock or DIRK methods have all the same matrix 
(see Sections IV .6 and IV.7)). We are now pleased to see that they also have the 
smallest error constants (N 0 rsett & Wolfbrandt 1977). 


Theorem 4.26. On each of the surfaces L i and H i (i = 1,..., s) the error constant 
C of (4.20) is minimized (in absolute value) when 7i = 7 2 = • • • = 7 a • 

Proof Our proof uses relative order stars (similar to (4.13)) 

B={ze C; |^)|>l}, q (z) = ^j^- 


(4-33) 
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where R 0 i d (z) is a real-pole approximation of order p = s + 1 corresponding to 
,..., 7 s and R new (z) is obtained by an infinitely small change of the 7 ’s. We 
assume that not all 7 ^ are identical and shall show that then the error constant can 
be decreased. After a permutation of the indices, we assume 7 X = max( 7 i ) (by 
Theorem 4.23 7 X > 0, so that 1/^ represents the pole on the positive real axis 
which is closest to the origin) and 7 S < 7 X . We don’t allow arbitrary changes of the 
7 ’s but we decrease 7 X , keep 7 2 , •.., 7 s _i fixed and determine 7 S by the defining 
equations for H (see (4.23)). For example, for 5 = 3 we have 


1 7i+7 2 . 7i7 2 

~ = 4 ]_ 3 ]_ 2 ! 

1 _ "fr +7 2 , 7l72 

3 ! 2 ! + 1 ! 


(4.34) 


Since the poles and zeros of R old (z) depend continuously on the 7 ^, poles and 
zeros of q(z) appear always in pairs (we call them dipoles). By the maximum 
principle or by Remark 4.14, each boundary curve of B leaving the origin must 
lead to at least one dipole before it rejoins the origin. Since there aio s + 2=p+l 
dipoles of q(z) (identical poles for R oid {z) and R new {z ) don’t give rise to a dipole 
of q(z) ) and p +1 pairs of boundary curves of B leaving the origin (Remark 4.14), 
each such boundary curve passes through exactly one dipole before rejoining the 
origin. As a consequence no boundary curve of B can cross the real axis except at 
dipoles. 

If the error constant of R old (z) satisfies C old < 0, then, by Remark 4.20, 
R old ( z ) has no zero between 1/h and the origin. Therefore also q(z) possesses 
no dipole in this region. Since the pole of R new {z) is slightly larger than l/^ 1 
(that of R old ( z)), the real axis between 1 / 7 X and the origin must belong to the 
complement of B . Thus we have C new - C old > 0 by (4.14) and (4.15). 

If C old > 0 there is one additional dipole of q(z) between 1 / 7 X and the origin 
(see Remark 4.20). As above we conclude this time that C new — C old < 0. 

In both cases |C ne J < |C oW |, since by continuity C new is near to C old . As 
a consequence no ( 7 i, • • , 7 S ) € H with at least two different 7 ^ can minimize 
the error constant. As it becomes large in modulus when at least one 7 - tends to 
00 (this follows from Theorem 4.18 and from the fact that in this case R(z) tends 
to an approximation with 5 replaced by 5 — 1 ) the minimal value of C must be 
attained when all poles are identical. 

The proof for L is the same, there are only s — 1 zeros of R(z) and the order 
is p = s . □ 


An illustration of the order star B compared to A is given in Fig. 4.10. An¬ 
other advantage of multiple real-pole approximations is exhibited by the following 
theorem: 

Theorem 4.27 (Keeling 1989). On each surface H i D {( 7 X ,..., 7 J; 7 ^ >0} the 
value | J?(oo) | of (4.20) is minimized when 7 X = 7 2 = ... = 7 S . 
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left pictures: C 0 /^ < 0 


right pictures: C 0 i j > 0 


0.35 

0.33 

0.2406340 


0.9455446 


0.345 

0.33 

0.2440772 


0.9628661 


Fig. 4.10. Order star A compared to B 


Proof. The beginning of the proof is identical to that of Theorem 4.26. Besides 
1 / 7 X and l/ 7 5 there is at best an even number of dipoles on the positive real axis 
to the right of 1 . As in the proof above we conclude that a right-neighbourhood 

of 1/7! belongs to B so that oo must lie in its complement (cf. Fig. 4.10). This 
implies 


As a consequence no element of H fi {(' 7 J; > 0} with at least two 

7 ^ different can minimize | i2(oo) |. Also | i?(oo) | increases if —>• 00 . The 
statement now follows from the fact that | R(oo) \ tends to infinity when at least 
one approaches zero. □ 
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Exercises 

1. (Ehle 1968). Compute the polynomial E(y) for the third and fourth Pade 
subdiagonal 3 (z) and R k k+ 4 (z) (which, by Proposition 3.4 consists of 
two terms only). Show that these approximations violate (3.6) and cannot be 
A-stable. 

2. Prove the general formula 

E ^ )= ((kTj)d E ^JZ7)r(n U-q + m + Q)(r-k-q)jy 2r 

r= [(*+7 + 2 )/ 2 ] 9-1 

for the Pade approximations R k ■ (j >k). 

3. (For the fans of mathematical precision). Derive the following formulas for the 


roots A, and x* 

of (4.28) 







1 1 

137T 

V 

= 1 + 

^cos( 

'9 + 2tt 

Xr = 

—I-— cos 

2 V5 

Is"’ 

v 3 


1 1 

25?r 

A 2 = 

= 1 + 

\/2 cos( 

(0 + 4w 

X 2 = 

2 + 7S COS 

77’ 

v 3 

X 3 = 

1 l 

- H— 7 = cos 

2 V3 

7T 

18’ 

a 3 = 

= 1 + 

V 2 cos^ 

t) 

.3/’ 


where 0 = arctan(v / 2/4). 

Hint. Use the Cardano-Viete formula (e.g., Hairer & Wanner (1995), page 66 ). 

4. Prove that all zeros of 

Or* S rpS — 1 Or. S - 2 

7 l~ Sl (5-1)! +S2 ( s —2)! 
are real and distinct whenever all zeros of 

Q(z) = 1-zS 1 +z 2 S 2 - ...±z s S s , S s ^ 0 

are real. Also, both polynomials have the same number of positive (and nega¬ 
tive) zeros (N 0 rsett & Wanner 1979, Bales, Karakashian & Serbin 1988). 

Hint. Apply Theorem 4.23. This furnishes a geometric proof of a classical 
result (see e.g., Polya & Szego (1925), Volume II, Part V, No.65) and allows us 
to interpret R(z) as the stability function of a (real) collocation method. 

5. Prove that ( 7 ,..., 7 ) £ L (Definition 4.21) if and only if L s ( 1 / 7 ) = 0, where 
L a (x) denotes the Laguerre polynomial of degree s (see Abramowitz & Ste- 
gun (1964), Formula 22.3.9 or Formula (6.11) below). 



IY.5 Construction of Implicit Runge-Kutta Methods 


Although most of these methods appear at the moment to be largely 
of theoretical interest ... (B.L. Ehle 1968) 


In Sect. II.7 the first implicit Runge-Kutta methods were introduced. As we saw in 
Sect. IV.3, not all of them are suitable for the solution of stiff differential equations. 
This section is devoted to the collection of several classes of fully implicit Runge- 
Kutta methods possessing good stability properties. 

The construction of such methods relies heavily on the simplifying assump¬ 
tions 

Evr 1 ^ 

i=l q 

C{r)): E^cf 1 = 7 

j= 1 ^ 

m- Evr^^u 

i=i y 

Condition B(p ) simply means that the quadrature formula (6-, cf) is of order p. 
The importance of the other two conditions is seen from the following fundamental 
theorem, which was derived in Sect. II.7. 

Theorem 5.1 (Butcher 1964). If the coefficients b i ,c i , a- of a Runge-Kutta method 
satisfy B(p ), C (ry), D(Q with p < rj + £ + 1 and p < 2rj + 2, then the method is 
of order p. □ 


q= 1,... ,p ; 


z = l,...,6, <?=!,...,ry; 


-cj) j = l,...,s, q 


Gauss Methods 


These processes, named “Kuntzmann-Butcher methods” in Sect. II.7, are colloca¬ 
tion methods based on the Gaussian quadrature formulas, i.e., c 1? ..., c s are the 
zeros of the shifted Legendre polynomial of degree 6, 


d s 

dx s 




For the sake of completeness we present the first of these in Tables 5.1 and 5.2. 
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Table 5.1. Gauss methods of order 2 and 4 



1 a/3 

1 1 \/3 


2 ~ir 

4 4 ~ 6 ~ 

1 

1 1 \/3 

1 V 3 1 

2 

2 2 + 6 

4 + 6 4 


1 

1 1 


1 

2 2 


Table 5.2. Gauss method of order 6 


1 a/15 

5 

2 

VT5 

5 

y/\5 

2 10 

36 

9 

15 

36 

30 

1 

5 y/l5 
36 + _ 24T 


2 

5 

vT5 

2 


9 

36 

“ ~24~ 

1 y/l5 

5 VTI 

2 

V\5 


5 

2 + 10 

36 + 30 

9 

+ 15 


36 


18 


18 


Theorem 5.2 (Butcher 1964, Ehle 1968). The s -stage Gauss method is of order 
2s. Its stability function is the ( s,s)-Pade approximation and the method is A- 
stable. 

Proof The order result has already been proved in Sect. II.7. Since the degrees of 
the numerator and the denominator are not larger than 5 for any 5 -stage Runge- 
Kutta method, the stability function of this 2s -order method is the (s,s)-Pade 
approximation by Theorem 3.11. A -stability thus follows from Theorem 4.12. □ 


Radau IA and Radau IIA Methods 

Butcher (1964) introduced Runge-Kutta methods based on the Radau and Lobatto 
quadrature formulas. He called them processes of type I, II or III according to 
whether c x ,..., c s are the zeros of 


d s ~ 1 
dx S_1 ^ 

(V(z-l) s 

(Radau left) 

(5.1) 

TT A 3 ” 1 

' dx 1 


(Radau right) 

(5.2) 

d*~ 2 

IIL dx s ' 2 

(x^ix-iy- 1 ). 

, (Lobatto) 

(5.3) 

The weights ,..., b s are chosen such that the quadrature formula satisfies B(s ), 
which implies B(2s — 1) in the Radau case and B(2s — 2) in the Lobatto case 
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(see Lemma 5.15 below). Unfortunately, none of these methods of Butcher turned 
out to be A-stable (see e.g. Table 3.1). Ehle (1969) took up the ideas of Butcher 
and constructed methods of type I, II and III with excellent stability properties. 
Independently, Axelsson (1969) found the Radau IIA methods together with an 
elegant proof of their A-stability. 

The 5-stage Radau IA method is of type I, where the coefficients a ■ ■, (i,j = 
1,..., s) are defined by condition D(s ). This is uniquely possible since the c- are 
distinct and the 6- not zero. Tables 5.3 and 5.4 present the first of these methods. 


Table 5.3. Radau IA methods of orders 1 and 3 


0 

1 


1 


0 

1 

1 



4 

~4 

2 

1 

5 

3 

4 

12 


1 

3 


4 

4 


Table 5.4. Radau IA method of order 5 


0 

1 

-1-V6 

- 1 +V 6 

9 

18 

18 

6-V6 

1 

88+ 7^/6 

88-43^ 

10 

9 

360 

360 

6 + V6 

1 

88 + 43-\/6 

^0 

1 

00 

00 

10 

9 

360 

360 


1 

16+ v^6 

16 — a/6 


9 

36 

36 


Ehle’s type II processes are obtained by imposing condition C(s) . By Theo¬ 
rem II.7.7 this results in the collocation methods based on the zeros of (5.2). They 
are called Radau IIA methods. Examples are given in Tables 5.5 and 5.6. For 5 = 1 
we obtain the implicit Euler method. 

Theorem 5.3. The s - stage Radau IA method and the s -stage Radau IIA method 
are of order 25 — 1 . Their stability function is the (5 — 1,5) subdiagonal Pade 
approximation. Both methods are A -stable. 

Proof. The stated orders follow from Theorem 5.1 and Lemma 5.4 below. Since 
c x — 0 for the Radau IA method, D(s) with j = 1 and B(2s — 1) imply (3.14). 
Similarly, for the Radau IIA method, c s = 1 and ( 7 ( 5 ) imply (3.13). Therefore, in 
both cases, the numerator of the stability function is of degree <5 — 1 by Propo¬ 
sition 3.8. The statement now follows from Theorem 3.11 and Theorem 4.12. 

□ 
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Table 5.5. Radau IIA methods of orders 1 and 3 


1 

1 

1 

-1 

1 


1 

5 1 

3 

12 12 

i 

3 1 

i 

4 4 


3 1 


4 4 


Table 5.6. Radau IIA method of order 5 


4- V6 

r- 

cl' 

00 

296 - 169\/6 

—2 + 3\/6 

10 

360 

1800 

225 

4 + \/6 

296+169V6 

88 + 7V6 

—2 — 3a/6 

10 

1800 

360 

225 

1 

16 — a/6 

16 + V6 

1 

36 

36 

9 


16-V6 

16 + V6 

1 


36 

36 

9 


Lemma 5.4. Let an s -stage Runge-Kutta method have distinct c 1 ,..., c s and non¬ 
zero weights b 1 ,..., b s . Then we have 

a) C(s) and B(s + u) imply D(v); 

b) D(s) and B{s + v) imply C(v). 


Proof. Put 


fq) _ 


5 > 



Conditions C(s) and B{s + v) imply 


d^Cj 1 = 0 for k = 1 ,..., 5 and q = 1 , ..., v. 
j =i 


(5.4) 


The vector (d[ 9 \ ..., di q must vanish, because it is the solution of a homoge¬ 
neous linear system with a non singular matrix of Vandermonde type. This proves 
D{y). 

For part (b) one defines 



q — 1 
a V C J 


3 = 1 


Q 


and applies a similar argument to 
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Lobatto IIIA, IIIB and IIIC Methods 


For all type III processes the c i are the zeros of the polynomial (5.3) and the 
weights b i are such that B{2s — 2) is satisfied. 

The coefficients a- are defined by C(s) for the Lobatto IIIA methods. It is 
therefore a collocation method. For the Lobatto IIIB methods we impose D(s) 
and, finally, for the Lobatto IIIC methods we put 

a n=b 1 for z = 5 (5.5) 

and determine the remaining a - by C(s — 1). Ehle (1969) introduced the first two 
classes, and presented the IIIC methods for 5 < 3. The general definition of the 
IIIC methods is due to Chipman (1971); see also Axelsson (1972). Examples are 
given in Tables 5.7-5.12. 

Table 5.7. Lobatto IIIA methods of orders 2 and 4 




0 

0 

0 

0 

0 

0 

1 

5 

1 

1 





2 

24 

3 

“24 

1 

1 

1 

1 

2 

1 

2 

2 

6 

3 

6 

1 

”T 

i 

1 

2 

1 

2 

2 


6 

3 

6 


Table 5.8. Lobatto IIIA method of order 6 


0 

0 

0 

0 

0 

5 — 1/5 

11 +i/5 

25-V5 

25- 13V5 

— 1 + 'v/5 

10 

120 

120 

120 

120 

5 + V5 

11-V5 

25+ 13\/5 

25 + V5 

-1 - V5 

10 

120 

120 

120 

120 

1 

1 

5 

5 

1 

12 

12 

12 

12 


1 

5 

5 

1 


12 

12 

12 

12 


Theorem 5.5. The s -stage Lobatto IIIA, IIIB and IIIC methods are of order 2 s —2. 
The stability function for the Lobatto IIIA and IIIB methods is the diagonal (s — 
1,5 — 1) -Fade approximation. For the Lobatto IIIC method it is the (5 — 2,5) -Pade 
approximation. All these methods are A -stable. 

Proof. We first prove that the IIIC methods satisfy D(s — 1). Condition (5.5) 
implies d[ q>} =0 (q = 1,. .., 5 — 1) for d[ q ^ given by (5.4). Conditions C(s — 1) 
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and B(2s — 2) then yield 

s 

d^c*- 1 =0 for k = 1,..., 5 — 1 and q = 1,..., 5 — 1. 

J=2 

As in the proof of Lemma 5.4 we deduce D(s — 1). All order statements now 
follow from Lemma 5.4 and Theorem 5.1. 

By definition, the first row of the Runge-Kutta matrix A vanishes for the IIIA 
methods, and its last column vanishes for the IIIB methods. The denominator of 
the stability function is therefore of degree < s — 1. Similarly, the last row of 
A — 11 b T vanishes for IIIA, and the first column of A — 11 b T for IIIB. Therefore, 
the numerator of the stability function is also of degree < s — 1 by Formula (3.3). 
It now follows from Theorem 3.11 that both methods have the (s — 1,5 — 1) -Pade 
approximation as stability function. 

For the IIIC process the first column as well as the last row of A — tb T van¬ 
ish. Thus the degree of the numerator of the stability function is at most 5 — 2 by 
Formula (3.3). Again, Theorem 3.11 and Theorem 4.12 imply the statement. □ 


For a summary of these statements see Table 5.13. 


Table 5.13. Fully implicit Runge-Kutta methods 


method 

simplifying assumptions 

order 

stability function 

Gauss 

B(2s) C(s) D(s) 

2s 

(5, 5) -Pade 

Radau IA 

B(2s — 1 ) CO- 1 ) DO) 

25-1 

(5 — 1, 5) -Pade 

Radau IIA 

B(2s — 1 ) CO) DO- 1 ) 

25-1 

(5 — 1, 5) -Pade 

Lobatto IIIA 

B(2s — 2 ) CO) DO- 2 ) 

25-2 

( 5 — 1,5 — 1)-Pade 

Lobatto IIIB 

B(2s — 2 ) CO- 2 ) DO) 

25-2 

(5 — 1,5 — 1)-Pade 

Lobatto IIIC 

B(2s — 2) CO- 1 ) DO- 1 ) 

25-2 

(5 — 2, 5) -Pade 


The W -Transformation 


We now attack the explicit construction of all Runge-Kutta methods covered by 
Theorem 5.1. The first observation is (Chipman 1971, Burrage 1978) that C(rj) 
can be written as 


/ail ... ai .^i C1 ... c? -^ z 1 * ••• o ::: S\ 

0 i ... 0 


V a s i ... a ss / \ 1 


, r i ~ 1 


Ji K /0 0 ... 0 

^1 c s ... cj /\0 0 


fc.6) 
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Hence, if V is the Vandermonde matrix 

/I Cj ... 

V = : : 

V 1 c s ••• 

then the first 77 (for 77 < s — 1 ) columns of V -1 AV must have the special struc¬ 
ture (with many zeros) of the rightmost matrix in (5.6). This “ V -transformation” 
already considerably simplifies the discussion of order and stability of methods 
governed by C(rj) with 77 close to 5 (Burrage 1978). Thus, collocation methods 
(77 = 5) are characterized by 

/° -Qo/ s \ 

1 0 - e js ' 

1/2 0 -e 2 /s 


V-'AV = 


V 


0 




(5.7) 


i/(s-l) 

where the g ’s are the coefficients of M ( t ) = Ili=i — c i) appear when the c\ 
in (5.6) are replaced by lower powers. Whenever some of the columns of V~ 1 AV 
are not as in (5.7), a nice idea of N 0 rsett allows one to interpret the method as a 
perturbed collocation method (see Nprsett & Wanner (1981) for more details). 

However, the V -transformation has some drawbacks: it does not allow a sim¬ 
ilar characterization of D((), and the discussions of A- and Testability remain 
fairly complicated (see e.g. the above cited papers). It was then discovered (Hairer 
& Wanner 1981, 1982) that nicer results are obtained, if the Vandermonde matrix V 
is replaced by a matrix W whose elements are orthogonal polynomials evaluated 
at c i . We therefore use the (non standard) notation 

a/2 k + 1 d k / k/ k 

dx k \ 




k\ 


K 

c (z-l)*) =v /2fcTI^( —1) 


j+k 


j =0 


j + k 
3 


for the shifted Legendre polynomials normalized so that 


/ 


Pl( x )dx = 1. 


These polynomials satisfy the integration formulas 


f 

f 


P 0 {t)dt = ^.PjOe) + ^P 0 (x) 

P k (t)dt = ( k+1 P k+1 (x ) -£ k P k _ i(x) k = 1,2, 


(5.8) 


(5.9) 


(5.10) 


Cfc — 


2\J4k 2 - 1 


with 


(5.11) 
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(Exercise 1). Instead of (5.7) we now have the following result. 

Theorem 5.6. Let W be defined by 

w ij = Pj-iici), i = l,...,s, j = l,...,s, (5.12) 

and let A be the coefficient matrix for the Gauss method of order 2s. Then, 

\ 


W~ 1 AW = 


/1/2 

£i o 

£ 2 - 

V 


0 -t- 1 , 

e.-i o > 


=-.x G . 


(5.13) 


Proof We first write C(rj) in the form 

8 fCi 

Y / %p(c j ) = / p(x)dx if deg(p) < r? — 1, 
j=i Jo 

which, by (5.10), is equivalent to 

E% P °^)=£i P i( c .)+5 P o(^ 


(5.14) 


3 =1 


(5.15) 


E a ij P k( c j) = Zk+i p k+i( c z) - tk P k-ii c i) k = 1,... ,rj — 1. 


3 = 1 


For rj = 5 , inserting (5.12), and using matrix notation, this becomes 


/< 


ii 


•\ 


V a. 


( X 


'11 


J ls 


a SS / 

W 1 S P S ( C l)\ 


(5.16) 


v. 


'si 


p .(o/ 


W v / 
ss ' 

/1/2 ~£i 

ix 0 -e 2 

^2 


V 


o 

€.-i 0 , 


Since for the Gauss processes we have P s [c\ ) = • • • = P 3 ( c 3 ) = 0, the last column 
respectively row of the right hand matrices can be dropped and we obtain (5.13). 

□ 


In what follows we shall study similar results for other implicit Runge-Kutta 
methods. We first formulate the following lemma, which is an immediate conse¬ 
quence of (5.15) and (5.16). 
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Lemma 5.7. Let A be the coefficient matrix of an implicit Runge-Kutta method and 
let W be a nonsingular matrix with 

w ij = P j _ 1 (c i ) for i = l,...,s, j = l,...,r)+l. 

Then C(rj) (with 77 < s — 1 ) is equivalent to the fact that the first 77 columns of 
W _1 AW are equal to those of X G in (5.13). □ 

The second type of simplifying assumption, D(() , is now written in the form 

'52 b iP( c i) a ij= b j [ P( x ) dx if deg(p)<C —1. (5.17) 

1=1 

The integration formulas (5.10) together with orthogonality relations 

f P 0 (x)dx = 1 , f P k (x)dx= f P 0 (x)P k (x) dx = 0 for k = 1 , 2 ,... 
Jo Jo Jo 

show that D(Q (i.e., (5.17)) is equivalent to 

i=1 
s 

£ P k( c i) b i a ij = (Zk p k-li c j) - tk+i p k+li c j)) b j 


(5.18) 

fc = i,...,c-i. 


i—1 

This can be stated as 


Lemma 5.8. As in the preceding lemma , let W be a nonsingular matrix with 

w ij =P j _ 1 (c i ) for i = l,...,s, j = 1,...,C + 1, 

and let B = diag( 6 j,..., b 3 ) with b i ^0. Then D(() (with (<s — l)is equivalent 
to the condition that the first ( rows of the matrix (W T B)A(W T £?) -1 are equal 
to those of X G in (5.13) (if B is singular, we still have (5.19) below). 

Proof. Formulas (5.18), written in matrix form, give 

/1/2 

^ 0 •• 

- - -€ c -i 


W T BA: 


«C-l 


0 




w t b. 


(5.19) 


It is now a natural and interesting question, whether both transformation ma¬ 
trices of the foregoing lemmas can be made equal, i.e., whether 

W t B = W- 1 or W t BW = I. (5.20) 


A first result is: 
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Lemma 5.9. For any quadrature formula of order > 2s — 1 the matrix 

W=(P J _ 1 (c i )) (5.21) 

satisfies (5.20). 

Proof. If the quadrature formula is of sufficiently high order, the polynomials 
P k (x)P l (x) (k + 1 < 2s — 2) are integrated exactly, i.e., 

b i p k( c i) p i( c i)= [ p k( x ) p ii x ) dx -hi\ ( 5 - 22 ) 

Jo 

this, however, is simply W T BW = I . □ 

Unfortunately, Condition (5.20) is too restrictive for many methods. We there¬ 
fore relax our requirements as follows: 

Definition 5.10. Let 77 , ( be given integers between 0 and s — 1. We say that an 
5 x 5 -matrix W satisfies T(t 7 ,() for the quadrature formula (6-,c-)- =1 if 

a) W is nonsingular 

b) w ij =P j _ 1 {c i ) i = l,...,s, j = l,...,max(j 7 ,C) + l 

c) W^BW =^ 0 ») 

where I is the (£ + 1) x (£ + 1) identity matrix; R is an arbitrary (s — £ — 1 ) x 
(s — ( — 1 ) matrix. 

The main result is given in the following theorem. Together with Theorem 5.1 
it is very helpful for the construction of high order methods (see Examples 5.16 
and 5.24, and Theorem 13.15). 

Theorem 5.11. Let W satisfy T(? 7 ,C) for the quadrature formula ( 6 -,c-)f =1 . 
Then for a Runge-Kutta method based on ( 6 -, c-) we have, for the matrix X = 
TU-iATU, 

a) the first q columns of X are those of X G ^( 77 ), 

b) the first £ rows of X are those of X G . 

Proof. The equivalence of (a) with C(r 7 ) follows from Lemma 5.7. For the proof 
of (b) we multiply (5.19) from the right by W and obtain 

w t bw -x = x- w t bw 

where X is the large matrix of (5.19). Because of Condition (c) of T(t 7 , £) the 
first C rows of X and X must be the same (write them as block matrices). The 
statement now follows from Lemma 5.8. □ 


T(V, 0 
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We have still left open the question of the existence of W satisfying T(?7, £). 
The following two lemmas and Theorem 5.14 give an answer. 

Lemma 5.12. If the quadrature formula has distinct nodes c • and all weights 
positive ( b i >0) and if it is of order p with p>2rj-\-1 and p> 2£ + 1, then the 
matrix 

W=(p j _ 1 (c i j) (5.23) 

possesses property T(rjX) an d satisfies (5.20). Here Pj(x) is the polynomial of 

degree j orthonormalized for the scalar product 

s 

(p> r ) = ( 5 - 24 ) 

i— i 

Proof. The positivity of the 6’s makes (5.24) a scalar product on the space of 
polynomials of degree < s — 1. Because of the order property (compare with 
(5.22)), the orthonormalized Pj(x) must coincide for j < max(?7,() with the 
Legendre polynomials Pj(x). Orthonormality with respect to (5.24) means that 

W T BW = I. □ 


Lemma 5.13. If the quadrature formula has distinct nodes c i and is of order 
p > s + then W defined by (5.21) has property T(q , £). 

Proof. Because of p > s + ( , (5.22) holds for fc = 0, — 1 and l — 0,..., £ . 

This ensures (c) of Definition 5.10. □ 


Theorem 5.14. Let the quadrature formula be of order p. Then there exists a 
transformation with property T(? 7 , £) if and only if 

P ^ q 4~ C 1 and p ^ -T 1 , (5.25) 

and at least max( 77 , £) + 1 numbers among c x ,..., c s are distinct. 

Proof. Set v — max( 77 , £) and denote the columns of the transformation W by 
w x ,..., w s . In virtue of (b) of T(? 7 , () we have 

w j = ( P i-l( c l)>---> P i-l( c s)) for j = 

These v + 1 columns are linearly independent only if at least v + 1 among c 1 ,..., 
c s are distinct. Now condition (c) of T(? 7 ,£) means that w li ... are or¬ 

thonormal to tu 1? ..., w s for the bilinear form u T Bv. In particular, the orthonor¬ 
mality of iuj, ..., u ;^ +1 to w 1 ,..., w v ^ rl (compare with (5.22)) means that the 
quadrature formula is exact for all polynomials of degree v + £. Therefore, p > 
v + £ + 1 (which is the same as (5.25)) is a necessary condition for T(? 7 , £). 
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To show its sufficiency, we complete w 1 ,..., w uJrl to a basis of R s . The new 
basis vectors w u + 2 i • • • , w s are then projected into the orthogonal complement of 
span(u; 1 ,..., u^ +1 ) with respect to u T Bv by a Gram-Schmidt type orthogonal- 
ization. This yields 

C+i 

Wj = w j -'^2{' w lBw j )w k for j = u + 2, □ 

Jfe=l 


Construction of Implicit Runge-Kutta Methods 

For the construction of implicit Runge-Kutta methods satisfying B(p) , C(rj) and 
D(Q with the help of Theorem 5.11, we first have to choose a quadrature formula 
of order p . The following lemma is the basic result for Gaussian integration. 

Lemma 5.15. Let c 1 ,..., c s be real and distinct and let b x ,... ,b s be determined 
by condition B(s) (i.e., the formula is “interpolatory”). Then this quadrature 
formula is of order 2s — k if and only if the polynomial M(x ) = (x — c 1 )(x— 
c 2 ) ... (x — c s ) is orthogonal to all polynomials of degree < s — k — 1, i.e., if and 
only if 

M(x) =c(P t (x) + a.P^ix) + . • • + a k P s _ k (x) ). (5.26) 

For a proof see Exercise 2. □ 


We see from (5.26) that all quadrature formulas of order 2s —k can be specified 
in terms of k parameters , a 2 ,..., a k . 

Next, if the integers q and ( satisfy q + (+l<2s — k and 2( + 1 < 2 s — k (cf. 
(5.25)), we can compute a matrix W satisfying T(q, Q from Theorem 5.14 (or one 
of Lemmas 5.12 and 5.13). Finally a matrix X is chosen which satisfies (a) and (b) 
of Theorem 5.11. Then the Runge-Kutta method with coefficients A = \VXW~ 1 
is of order at least min (?7 + ( + 1, 2q + 2) by Theorem 5.1. 

Example 5.16. We search for all implicit Runge-Kutta methods satisfying B(2s — 
2), C(s — 1) and D(s — 2), i.e., methods which are of order at least 2s —2 by 
Theorem 5.1. As in (5.26), we put 

M(x)=c(P s (x) + a 1 P s _ 1 (x) + a 2 P s _ 2 (x)). (5.27) 

If a 2 satisfies 

s — 1 \/2 s “I - 1 

a 2 < 


S a/2 s - 3 
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then the roots of M are real and distinct (see Exercise 7). The matrix W given in 
(5.21) has Property T(s — 1,5 — 2) by Lemma 5.13. Finally we put 





\ 


-£i 

o 


i 


\ 

~£s —2 

0 Ps-l 

is-, Ps / 


(5.28) 


(see Theorem 5.11), and obtain with A — WXW~ X a family of implicit Runge- 
Kutta methods of order 25 — 2 with the four parameters a 1 , a 2 , j3 s , (3 S _ 1 . 

All methods of Table 5.13 (with the exception of Lobatto MB) must be special 
cases. The corresponding parameter values are indicated in Table 5.14 (for their 
computation see Exercise 3). If we put aq = 0 and a 2 = —a/ 25 -f l/y/2s — 3 
(Lobatto quadrature), we obtain the two-parameter family of Chipman (1976). 


Table 5.14. Special cases of method (5.27, 5.28) 


Method 

c*i 

Ol 2 

Ps 

fis — 1 

Gauss 

0 

0 

0 

-is- 1 

Radau IA 


0 

l/(4s — 2) 

—is-1 

Radau IIA 

-V / 25 + l/V27=T 

0 

l/(4s — 2) 

-is-1 

Lobatto IIIA 

0 

—y/2s + l/\/2s—"3 

0 

0 

Lobatto IIIC 

0 

- V2s + l/v/2s^3 

l/(2s — 2) 

-6-l(25-l)/(5-l) 


Stability Function 


We try to express the stability function of an implicit Runge-Kutta method in terms 
of the transformed Runge-Kutta matrix X = W~ l AW . From (b) and (c) of Prop¬ 
erty T{rjX) it follows that 

We x = 1, W T Bl = e 1 , ei = (1,0,..., 0) T . 

(5.29) 

Hence Formulas (3.2) and (3.3) become 


R{z) = l + zef(I-zX)- 1 e 1 , 

(5.30) 

P( x det(J — zX + ze x e[) 

1 j det(J-zX) 

(5.31) 


The stability function depends only on X and not on the underlying quadrature 
formula. Hence, the stability function of the method of Example 5.16 depends 
on (3 S and /? s _ 1 only. Formula (5.31) becomes more symmetric (Hairer & Tilrke 
1984) if we introduce the arithmetic mean of the matrices X and X — and 
define 

Y = X-±e 1 e'[, 

which is just the matrix X without the 1/2 in the (1,1) -position. 


(5.32) 
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Proposition 5.17. For a Runge-Kutta method (3.1) let W satisfy T(rj, () for some 
77 , ( > 0, and let Y be given by (5.32) where X = W~ l AW. The stability function 
then satisfies 


R(z) 


1 +W 


(5.33) 


with 

<H(z) = ze'[(I-zY)- 1 e 1 . (5.34) 


Proof. Applying the Runge-Kutta method to the test equation (2.9) yields 

9 = % 0 + zAg, y l =y 0 + zb T g. 

With W~ l g = g = (g 1 ,... ,g s ) T this becomes 

z 

(I-zY)g = e i(y 0 + -g 1 ), y 1 =y 0 +zg 1 , (5.35) 

where we have used (5.29). Computing g x from the first equation of (5.35) and 
inserting this into the second one gives the result. □ 


If the Runge-Kutta method satisfies B(2v + 1), C(v) and D(v) for some in¬ 
teger v , then Y is given by (see Theorem 5.11) 

0 
£l 

0 

iv I 

In this case the computation of (5.34) for the ( 5 , 5 ) -matrix Y can be reduced to 
that of the smaller (s — v,s — v) -matrix Y v as follows: 

Theorem 5.18. If Y is given by (5.36), the function \I>(z) of (5.34) has the contin¬ 
ued fraction representation 

V( z ) = ^ + ^ + --- + ^^ + £^ I ,(z) (537) 

where ^ v (z) — ze[(I — zY^^e^. 

Proof. Let Y- (for 0 < j < v + 1) denote the (s — j,s — j) principal minors of 
Y , where the first j rows and columns are suppressed. Expanding the determinant 
of I — zYj__! with respect to the first row (and then the first column) gives for 
j = 1 ,..., 1 / 

det (I - zY ^) = det (I - zYj) + gjz 2 det (I - zY j+1 ). 




(5.38) 
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( z ) — ze T (I - zY ) _1 e — 

3 {z)-ze^l zY 3 ) e x -z ^ ^ ■ 


By Cramer’s rule, the functions ^j(z) can also be written as 
$ 

Dividing (5.38) by det(I — zY-) yields 

^-i (z) = TT Tjrtjz)' 

A repeated use of (5.40) gives (5.37) since \P(z) = \I> 0 (z) • 


(5.39) 

(5.40) 
□ 


We are thus naturally led to continued fraction expansions, a technique which 
was historically the earliest one. Birkhoff & Varga (1965) used it in their proof 
of the ^-stability of the diagonal Pade approximations. Later, Ehle (1969, 1973) 
tried to extend “Varga’s proof” to verify the A-stability of the first and second 
subdiagonals of the Pade table (“This was unsuccessful because the resulting con¬ 
tinued fraction expansions were not easily related to one another.”). Therefore, 
Ehle (1973), Ehle & Picel (1975), proved A-stability results for the first and sec¬ 
ond subdiagonal and some generalizations by a completely different method. The 
following study of A-stability (see Butcher 1977, Hairer 1982, Hairer & Ttirke 
1984) combines the above continued fraction expansion with properties of positive 
functions. 

Positive Functions 


Many stability conditions for numerical methods can be expressed 
in the form that some associated function is positive. 

(G. Dahlquist 1978) 

A-stability of an implicit Runge-Kutta method is defined by the property 

\R(z)\<l for Rez<0. (5.41) 

Since the transformation (1 + C)/(1 — C) occurring in (5.33) maps the negative 
half-plane onto the open unit disc, (5.41) is equivalent to 

Re\P(z) <0 for R ez < 0. (5.42) 

This condition means that — \P(— z) is a positive function; for rational functions 
the concept of positivity can be defined as follows: 

Definition 5.19. A rational function f(z) is called positive if 
Re/(z)>0 for Rez>0 . 

A nice survey on the relevance of positive functions to numerical analysis is 
given by Dahlquist (1978). The following lemmas collect some properties of posi¬ 
tive functions. 
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Lemma 5.20. Let f(z ) and g(z ) be positive functions. Then we have 

a) af(z) + (3g(z) is positive, if a > 0 and (3 > 0; 

b) 1 / / ( z ) is positive; 

c) f(g(z)) is positive. □ 


Observe that the poles of a positive function cannot lie in the positive half¬ 
plane, but poles on the imaginary axis are possible, e.g., the function 1/z is posi¬ 
tive. 

Lemma 5.21. Suppose that 

f(z) = --\-g(z) with g(z) = 0(1) for z —>► 0 , 
z 

and g(z) ^ 0. Then f(z) is positive if and only if c > 0 and g(z) is positive. 

Proof The “if-part” follows from Lemma 5.20. Suppose now that f(z) is positive. 
The constant c has to be non-negative, since for small positive values of 2 we 
have Re/ (z) >0. On the imaginary axis we have (apart from poles) R eg(iy) = 
Re f(iy) > 0 or more precisely 

liminf Reg(z)>0 for y £ R. 

z->iy,Rez>0 K 

The maximum principle for harmonic functions then implies that either g(z) = 0 
or g(z) is positive. □ 


A consequence of this lemma is the following characterization of A-stability. 

Theorem 5.22. Consider a Runge-Kutta method whose stability function is given 
by (5.33) with Y as in (5.36). It is A-stable if and only if 

ReV u (z)<0 for Re*<0 (5.43) 

where u (z) = zef (I — zY u )~ 1 e 1 as in (5.37). 

Proof. We consider the submatrices Y- of Y and the functions ^j(z) of (5.39). 
As we prefer to work with positive functions we put 

Xj{z) = -V j{-z ) = zel(I + zY v )- l e 1 . (5.44) 

By (5.42), A-stability is equivalent to the positivity of Xq( z ) an d condition (5.43) 
means that x u ( z ) a positive function. Relation (5.40) becomes 

(xj-iiz ))' 1 = l + %Xj(z)- 

Since all Xj( z ) are bounded near the origin and do not vanish identically (see 
(5.44)), it follows from Lemma 5.21 that Xj{ z ) 1S a positive function iff Xj-i { z ) 
is positive. This proves the theorem. □ 
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Example 5.23. For the Runge-Kutta method of Example 5.16 with X given by 
(5.28) we have 






Since 

it follows from Lemma 5.21 and Theorem 5.22 that the method is A-stable iff 


l -! = 0 or (&_! < 0 and p 9 > 0). (5.45) 

Comparing this result with Tables 5.14 and 5.13 leads to a second proof for the 
A-stability of the diagonal and the first two subdiagonal Pade approximations for 
e z (see Theorem 4.12). 


Example 5.24 (Construction of all A-stable Runge-Kutta methods satisfying 
B(2s — 4), C(s — 2) and D(s — 3)). We take a quadrature formula of order 2s —4 
and construct, by Theorem 5.14, a matrix W satisfying Property T(s — 2,6 — 3). 
The Runge-Kutta matrix A is then of the form 

A=W(Y+^e 1 el)W~ 1 
with Y given by (5.36), v — s — 3 and 


/ 0 7s-2 0s- 2 \ 

K- 3= U 7.-1 0.-1 • 

\ 0 0 . / 


For the study of ^-stability we have to compute \J/ S _ 3 (£) from (5.39). Expanding 
det(J — zY s _ 3 ) with respect to its first column we obtain 


where 


(*.-*(*)) 


1 ^.-2(gQ~gl^) 

* l-/lZ + / 2 Z 2 


/i=0.+7.-i, / 2 =0s 7.-1-0.-17., 

9o = 7s—2 > 9i = -0s7s-2 +0s- 2 7s- 


(5.46) 


By Lemma 5.21 and Theorem 5.22 we have A-stability iff either g Q 

z(9o+9iz) 

1 + fi z + / 2 z 2 

is a positive function, which is equivalent to (see Exercise 4b) 


9 1 = 0 or 
(5.47) 


So >0, g 1 > 0, / 2 > 0, s 0 /j - Si > 0. (5.48) 


A similar characterization of A-stable Runge-Kutta methods of order 2s — 4 is 
given in Wanner (1980). 
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Exercises 

1. Verify the integration formulas (5.10) for the shifted Legendre polynomials. 

Hint. By orthogonality f* P k (t)dt must be a linear combination of Ar+l? 
and P k _i only. The coefficient of P k vanishes by symmetry. For the rest just 
look at the coefficients of and x k_1 . 

2. Give a proof of Lemma 5.15. 

Hint (Jacobi 1826). If f(x) is a polynomial of degree 2s — k — l, and r(x) 
the interpolation polynomial of degree s — 1 , then f(x) = q(x)M(x) + r(x ), 
where deg q(x) < s — k — 1 . 

3. Let R(z) be the stability function of the Runge-Kutta method of Example 5.16. 

a) The degree of its denominator is < s — 1 iff (3 S = (3 3 _ 1 £ s _ 1 2(2s — 3). 

Hint. Use Formula (5.31) and the fact that det(7 — zX G ) is the denominator 
of the diagonal Pade approximation. 

b) The degree of the numerator of R(z) is <5 — 1 iff 

& = -&-iC-i2(25-3). (5.49) 

c) The degree of the numerator of R(z) is < s - 2 iff in addition to (5.49) it 
holds/?, = 1/(25-2). 

d) Verify the entries of Table 5.14. 

4. a) The function 

a-\-(3z 

s(z) =-— 

7 + $z 

with 7 > 0 satisfies Re s(z) > 0 for Re 2 > 0 iff a > 0, f3 > 0 and 8 > 0. 
b) Use the identity (for g 0 > 0 ) 

1 + f\ zJ r f 2 z2 __ (/1 ~ 9i/9o) + f 2 z 

z{%+giz) zg 0 g 0 +gi z 

to verify that the function given in (5.47) is positive iff (5.48) holds. 

5. Suppose that 

f(z)=cz + g(z) with g(z) — 0(1) for z 00 

and g(z) ^ 0. Using the transformation z —> 1/z in Lemma 5.21, show that 
f(z) is a positive function, if and only if c > 0 and g(z) is positive. 

6 . Give an alternative proof of the Routh criterion (Theorem 13.4 of Chapter I): 
All zeros of the real polynomial 

p(z) — a 0 z n + a 1 ^ u_1 


+ • • • + a n (a o >0) 
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lie in the negative half-plane Re z < 0 if and only if 

c i0 > 0 for i = 0,1,... , n. 
The c %3 are the coefficients of the polynomials 


Pi( z )= c io zn l + c n z " ' 

- 2 +c i 2 z n - l ~ 4 + ... 


where 

Po( z ) — a 0 zH +«2 2r " 2 + --., 

i.e. y c 0 j — a 2 j 


Pi{z) — a 1 z n ~ 1 + a 3 z n ~ 3 + ., 

. i.e., c l3 = a 2 j_^ 1 . 


and 

P i+1 ( z ) = c i 0 Pi-i( z ) - ^-ifi^iiz), 

i = 1 ,..., n — 1 . 

(5.50) 


Hint. By the maximum principle for harmonic functions the condition “p(z) ^ 
0 for Re2:>0” is equivalent to “|p(— z) /p( z) | < 1 for Re z > 0 ” and the con¬ 
dition that p 0 (z) and p x (z) are irreducible. Using the transformation (5.33) 
this becomes equivalent to the positivity of p 0 (z)/p 1 (z ). Now divide (5.50) 
by c^i oPj(^) and use Exercise 5 recursively. 


7. Show that 


a 2 < 


s — 1 y/2s -f-1 
5 y/2s - 3 


(5.51) 


is a sufficient condition for M(x) = P s (x) + a 1 P s _ 1 (x) -\-a 2 P s _ 2 (x) to have 
real and pairwise distinct roots. 

Hint. (See “Lemma 18” of Nprsett & Wanner 1981). Consider the set D of 
all pairs (a 1 ,a 2 ) for which the roots c- of M(x) are real and distinct, and 
the corresponding interpolatory quadrature formula has positive b •. Verify that 
(0, 0) Gfi, and show that for (a l5 a 2 ) £ dD either one b i becomes zero or 
two c • coalesce but the quadrature formula remains of order 2s — 2. Therefore 
it must be the Gaussian formula with 5-1 nodes of order 25 — 2 and we must 
have 


Ps( X ) + a l P s-l( X ) + a 2 p s -2( x ) = c ( x ~P)Ps-l(x)- (5.52) 

Now use the three-term recursion formula 

s L P s( x ) = ( x ~ l/2)-P s _i(a:) - (s - l)e a _j P a _ 2 (a:) (5.53) 

(Abramowitz & Stegun p.782, modified) to eliminate xP s _ 1 on the right of 
(5.52). Then obtain by comparing the coefficients of P 3 , and P s _ 2 


c 


1 

< 



5 — 1 \/2 s -f- 1 
2 5 y/2^3 


(5.54) 


If p is one of the roots of P s _ 1 , then (5.52) has a double root and the estimate 
(5.51) for a 2 is optimal. 
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... they called their methods “diagonally implicit”, a term which 
is reserved here for the special case where all diagonal entries are 
equal ... (R. Alexander 1977) 


We continue to quote from this nice paper: “To integrate a system of n differential 
equations, an implicit method with a full 5 x 5 matrix requires the solution of ns 
simultaneous implicit (in general nonlinear) equations in each time step (...) One 
way to circumvent this difficulty is to use a lower triangular matrix (a-) (i.e., a 
matrix with a- = 0 for i < j)\ the equations may then be solved in 5 successive 
stages with only an n -dimensional system to be solved at each stage”. In accor¬ 
dance with many authors, and in disaccordance with others (see above), we call 
such a method diagonally implicit (DIRK). 

“In solving the n -dimensional systems by Newton-type iterations one solves 
linear systems at each stage with a coefficient matrix of the form I — ha-df /dy . 
If all a • • are equal one may hope to use repeatedly the stored LU-factorization of 
a single such matrix”. When we want to emphasize this additional property for a 
DIRK method, we shall call it a singly diagonally implicit (SDIRK) method. 

It is a curious coincidence that in the early seventies at least four theses dedi¬ 
cated a large part of their research to DIRK and SDIRK methods, very often having 
in mind their usefulness for the treatment of partial differential equations (R. Alt 
1971, M. Crouzeix 1975, A. Kurdi 1974, S.R Nprsett 1974). The classical paper 
on the subject is Alexander (1977). 


Order Conditions 


The traditional problem of choosing the coefficients leads to a 
nonlinear algebraic jungle, to which civilization and order were 
brought in the pioneering work of J.C. Butcher, further refined in 
the Thesis of M. Crouzeix. (R. Alexander 1977) 

We want to make the “jungle” still a little more civilized by the following idea: 
consider a SDIRK scheme 


C 1 

7 



C 2 

a 21 

7 




a s2 • 

7 


h 

b 2 

•• K 
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with 5 stages. The order conditions (see Vol. I, Sect. II.2) consist of sums such as 

Y. b i a ik a ki = \- ( 6 - 1 ) 

Because there are now more non-zero entries in the matrix A than for explicit 
methods, this sum contains far more terms as it did before. The trick is to transfer 
all expressions containing a 7 to the right-hand side of (6.1). The resulting sum, 
denoted by ^ , is then only built upon the subdiagonal entries as for explicit 
Runge-Kutta methods. The right-hand side becomes (for this example) 

Yl' h i a ik a ki^Yl, h Mik-l 5 ik){ a ki-lhi) ( 6 - 1 ’) 

j,k,l j,k,l 

where S jk denotes the Kronecker delta. Multiplying out we obtain 

E ' b i a ik a ki = E b i a 3 k a ki -7 (E b i a n +E b J a Jk) + 7 2 E b i■ 

j,k,l j,kj j,l j,k j 



For all sums on the right we insert order conditions (e.g. from Theorem 2.1 of 
Sect. II.2) and obtain 

EVit a «= g-7 + 7 2 - (6.1”) 

The general rule is that there appears an alternating polynomial in 7 whose coeffi¬ 
cients are sums of 1 / 7 (u), where u runs through all trees which are obtained by 
“short-circuiting” one, two, three, etc. vertices of t (with exception of the root). 
The conditions for order 4 obtained in this way are summarized in Table 6.1. For 
5 = 2, p = 3 and 5 = 3, p = 4 these simplified conditions have only very few non¬ 
zero terms and the equations become especially simple to solve (see Exercise 1). 


Stiffly Accurate SDIRK Methods 

Our main interest here lies in methods satisfying 

a sj= b j for 3 = 1 , •••,*, (6.2) 

i.e., in methods for which the numerical solution y x is identical to the last internal 
stage. A first consequence of this property is that i2(oo) = 0 (see Proposition 3.8). 
The order conditions for such methods can, instead of (6.1”), be simplified still 
further: consider again the example ( 6 . 1 ), which can now be written as 

E a sj a jk a kl = g- 

j,k,l 
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Table 6.1. Order conditions for SDIRK methods 


t 

e(t) 

previous conditions 

simplified conditions 

•j 

k / 

1 

X & j = 1 

5>i = i 

2 

X/ bj°jk — 2 

X bj°jk — 2 — 7 

M , 

3 

XI ^ j°jk°jl = 3 

Y,' b j a jk a ji = 5-7+T 2 

/ 

* / ra ' 

3 

X! & j a jk a kl — 6 

Ys' h j a jk a kl = i -7 + 7 2 

1 < 
V* 

, j 
\ / m 

4 

^TjbjCljkCLjlCLjm — 4 


4 

X/ bj°jk a kl a jm — 8 

X ^j a jk a ki a jm = ^~^y+^y 2 — y 

Y 

4 

X bjajkCtklCtkm = Y2 

X ^ ) j ct jk a, kl ct km = 12 ~ 3 7 4_ 2 7 7 

> 

4 

^Yj^ > j a jk a kl a lm — 24 

X bj a jk a kl a, lm ~ 24 — 2" 7 2 7 — 7 


This time we have, instead of (6.1’) 


Y ' a si a ik a ki = ~ 1S sj){ a jk - l 5 jk)( a ki - 7 Ski) 

j,k,l j,k,l 

= Y a sj a jk^ki -7(S a sj a jk + Y, a 3j a ji + Y a °k a ki) 

j,k,l j,k j,l k,l 

+ 7 2 ( Y a si + Y a °k + Y a °i ) -7 3 -l- 

j k l 

Again inserting known order conditions, we now obtain 

Y ' a sj a jk a ki = 7 3 - (6 - r ” ) 

},k,l 


The general rule is similar to the one above: the difference is that all vertices 
(including the root) are now available for being short-circuited. Another example, 
for the tree t 42 , is sketched in Fig. 6.1 and leads to the following right-hand side: 


1 

o-7 


(1 11 1\ 2 /l 11 1\ 

(3 + 3 + 1 ‘ 2 + e) (2 + 11 + 11+ 2 + 2 + 2) 


-7 3 (1 + 1 + 1 + l) + 7 4 = - - - 7 + 4 7 2 


. 4 7 3 _|_ 7 4 . 


The order conditions obtained in this manner are displayed in Table 6.2 for all trees 
of order < 4. The expressions are written explicitly for the SDIRK method 
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Fig. 6.1. Short-circuiting tree £42 


(6.3) with 5 = 5 satisfying condition (6.2) 

c 2 = a 21 

C 3= a 31+ a 32 ( 6 - 3 ) 

c 4 — a 4i + a 42 + a 43 

Observe that they become very similar to those of Formulas (1.11) in Sect. II.l. 



Table 6.2. Order conditions for method (6.3) 


7 k a sj — hi +£>2 +&3 +&4 — Pi (6.4;1) 

, / J Yl fa sj a ik=hc2+hc f 3 +b 4: c f 4 =p2 (6.4;2) 

V 

J ]£ a s ja jk aji = b 2 c 2 2 +b 3 c 3 2 +b±c 4 2 = p 3 (6.4;3) 

/ 7 2 a sj a jk a kl = ha32C 2 + 6 4 (a 42 C 2 + 043 ^ 3 ) —P4 (6.4;4) 

j a sj a, jk a jl®jm = ^2 c 2 “^^3 c 3 +^4 c 4 — P5 (6.4;5) 

m W* f 

j (^sj^jk^jl^lm —^3^3^32^2 ^ 4 ^ 4 ( 042^2 ^ 43 ^ 3 ) ~ P 6 (6.4j6) 

y T,'a 3j a jk a kl a km =has 2 C 2 2 +h(a A2 c f 2 2 +a 4:3 c 3 2 )=p 7 (6.4;7) 

/ 2 a sj a jk a kl a lm — &4a 43 a32C2 — P 8 (6.4;8) 


Pi = 1 - 7 

1 9,2 

P 2 = 2 “ 2 7 + 7 

P3 = ^ -2j+3j 2 -7 3 

1 3 2 3 

P4 = - - -7 + 37 -7 


P5 = ^ - 27 + ^7 2 - 47 3 + 7 4 

1 4 2 3 4 

P 6 — g _ 2 7+47 -47 +7 
1 7 2 3 4 

P7 = J2 ~^ + 2 7 _47 +7 
1 2 

P 8 = 24 “ g7 + 37 2 -47 3 +1 
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Solution of Equations (6.4). By clever elimination from equations (6.4;4) and 
(6.4;6) as well as (6.4;4) and (6.4;7) we obtain 

^3 a 32 C 2( C 4 ~ C 3) ~ C 4^4 — Pq 

if (I f \ _ f ( 6 - 5 ) 

o 4 c 3 a 43 (c 2 — c 3 j — c 2 p 4 — p 7 . 

Multiplying these two equations and using (6.4;8) gives 

p 8 b 3 (c' 4 - 4)(4 -4)4 = (4p 4 -p 6 ){c' 2 p 4 -p 7 ). 

We now compute 6 2 ,6 3 ,6 4 from (6.4;2), (6.4;3), (6.4;5). This gives 

63 = (—P 2 C 2 C 4 + P3 ( C 4 + C 2 ) ~ P5) / ( C 3 ( C 3 “ C 2)( C 4 ~ C 3)) (6-6) 

and b 2 as well as 6 4 by cyclic permutation. Comparing the last two equations leads 
to 

J _ PsPs C 2 - PsP& ~ (: 2P(jPi + P 6 P 7 ft- n, 

C 4 1 1 . (6.7) 

P 8 P2 C 2 “ P 8 P3 - C 2P4^4 + P4P7 

We now choose 7 , c 2 and c 3 as free parameters. Then c 4 is obtained from (6.7); 
b 2 , 6 3 , 6 4 from (6.6), from(6.4;l), a 32 and a 43 from (6.5), a 42 from (6.4;4), 

and finally a 21 , a 31 , a 41 from (6.3). 


Embedded 3rd order formula: As proposed by Cash (1979), we can append to the 
above formula a third order expression 

4 

Vi =y 0 + h 'E,b i k i 

i—1 

(thus by omitting the term b 5 = 7 ) for the sake of step size control. The coefficients 
,..., 6 4 are simply obtained by solving the first 4 equations of Table 6.1 (linear 
system). Continuous embedded 3rd order formulas can be obtained in this way too 
(see Theorem 6.1 of Sect. II. 6 ) 

4 

y{ x o + Oh) k y 0 + t>i{ 0 )ki- 

1=1 

The coefficients b 1 (0 ),..., b 4 (0) are obtained by solving the first 4 (simplified) 
conditions of Table 6.1, with the right-hand sides replaced by 

Q2 A3 A3 

0 , y-7 0, y-70 2 +7 2 0, y- 7 0 2 + 7 2 0, 

respectively. The continuous solution obtained in this way becomes y x for 6 = 
1 instead of the 4-th order solution y 1 . The global continuous solution would 
therefore be discontinuous. In order to avoid this discontinuity, we add b 5 (0) and 
include the fifth equation from Table 6.1 with right-hand side 


04 

T 



7 3 0 . 
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The Stability Function 


By Formula (3.3), the stability function R(z) for a DIRK method is of the form 

R (*) = n- V1 P(Z \ —n-\> ( 6 - 8 ) 

{l- a ii z ){l- a 22 z )...{l-a S sZ) 

because the determinant of a triangular matrix is the product of its diagonal entries. 
The numerator P(z) is a polynomial of degree s at most. If the method is of order 
p> s, this polynomial is uniquely determined by Formula (3.18). It is simply 
obtained from the first terms of the power series for (1 — a n z)...(l — a ss z) - e z . 

For SDIRK methods, with a n = ... = a ss = 7 , we obtain (see also Formula 
(3.18) with ^ = (- 7 )^) ) 


R(z) 


P(*) 

(1-7*)” 


with error constant 

C 


P{z) = (-l)'Y l L ( r i) 

j =0 



7 S (-1) S+1 ,(i) 
3+1 3+1 



where 


j =0 v 


x* 


(6.9) 


( 6 . 10 ) 


( 6 . 11 ) 


is the 5 -degree Laguerre polynomial. L^\x) denotes its k -th derivative. Since 
the function (6.9) is analytic in C - for 7 > 0 , A -stability is equivalent to 


E(y) = Q{iy)Q{-iy) - P(iy)P{-iy) > 0 for all y ( 6 . 12 ) 

(see (3.8)). This is an even polynomial of degree 2s (in general) and subdegree 2 j 
where j — [(p + 2)/2] (see Proposition 3.4). We therefore define the polynomial 
F(x) by 

F(y*)=E(y)/y 2 i j = [{p + 2)/2]. 


and check the condition F(x) > 0 for x > 0 using Sturm sequences. We display 
the results obtained (similar to Burrage 1978) in Table 6.3. 

For completeness, we give the following explicit formulas for E(y ). 

s = 1; p= 1 : 

E = y 2 ( 27-I) 

<s = 2 ; p = 2: 

E = y *(-\ + 2 1 - W + 4 7 3) = y \ 2 7 - 1)2 (7 - 

3 = 3; p= 3 : 

E = y 4 - 7 + 37 2 - 27 s ) + y 6 (- je + i - 1 J T 1 + 2 J ir - 12 1 4 + e 7 5 ) 
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Table 6.3. A-stability of (6.9), order p > s 


s 

A -stability 

A -stability and p = s + 1 

1 

1/2 < 7 < °° 

1/2 

2 

1/4 < 7 < °° 

(3 + V3)/6 

3 

1/3 <7< 1.06857902 

1.06857902 

4 

0.39433757 < 7 < 1.28057976 

— 

5 

/ 0.24650519 < 7 < 0.36180340 
\ 0.42078251 < 7 < 0.47326839 

0.47326839 

6 

H 

0.28406464 < 7 < 0.54090688 

— 

/ 

8 

0.21704974 < 7 < 0.26471425 

— 


s = 4; p = 4 : 

+ (-sh + 5 - ^ + '-¥■ - 227' +: W). 

A - stability means here that all coefficients must be non-negative. A general for¬ 
mula is as follows. 

Lemma 6.1. The E-polynomial for (6.8) with a n = ... = a ss = 7 and p> s 
satisfies 

S(y)=(l- J L 9 (i) 2 )( 7 y) 2s 

s-i i. 1/7 (6.13) 

-2 E (-i)* +i (7y) 2> / 

j=[(p+2)/2] y ° 

Proo/ Inserting Formula (6.9) into the definition of E(y) 

E{y ) = (1+ 7V) 3 - P(iy)P(-iy) 

= (i+7 vr -EE 4 ” fc) (7) 4 -° (^) (7<y) fc+, (-i)' 

k l 7 7 

and using integration by parts for the verification of 
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one obtains the result, since 

£ (-i)‘L{ s - k \o)Li^(o) = (-iyf s \ □ 

k+l=2j 


Multiple Real-Pole Approximations with R( oo) = 0 

For methods satisfying (6.2) we have iZ(oo) = 0. Therefore the highest coefficient 
of P(z) in (6.9) is zero. If the order of the method is known to be p > s — 1 , the 
remaining coefficients of P(z) are still uniquely determined by 7 and we have 

p(z) = (- 1 )* y>(/-;)(I)( 7 7 (6.i4) 

i =0 7 

with error constant 

C = (-lYL s ^y. (6.15) 

The first polynomials E(y) of (6.12) are now: 
s = 2, p — 1: 

£ = y 2 (-l+4 7 -2 7 2 )+yV 
5 = 3, p = 2: 

E = y 4 (— \ + 37 - 12 7 2 + 18 7 3 - 6 7 4 ^ + y 6 7 6 
5 = 4, p = 3: 

E = y A {j2~T- +67 2 -8 7 3 +2 7 4 ) 

+ y 6 (—^ + ^ -67 2 + ^—52 7 4 + 487® - 127®^ +y 8 7 8 . 

The regions of 7 for A-(and hence Testability are displayed in Table 6.4. 


Table 6.4. L-stability of R(z) with P from (6.14), order p > 5 — 1 


5 

L -stability 

L -stab, and p = s 

2 

(2 - V 2)/2 < 7 < (2 + V 2)/2 

7 = ( 2 ± V 2)/2 

3 

0.18042531 < 7 < 2.18560010 

7 = 0.43586652 

4 

0.22364780 < 7 < 0.57281606 

7 = 0.57281606 

5 

0.24799464 < 7 < 0.67604239 

7 = 0.27805384 

6 

0.18391465 < 7 < 0.33414237 

7 = 0.33414237 

7 

0.20408345 < 7 < 0.37886489 

— 

8 

0.15665860 < 7 < 0.23437316 

7 = 0.23437316 
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Choice of Method 

We now determine the free parameters for method (6.3) with 3 = 5 and order 4. 
For a good choice of 7 , we have displayed in Fig. 6.2 the error constant C as well 
as the regions for A- and A(0) -stability. 



This suggests that 7 between 0.25 and 0.29 is a good choice. The method is "hen 
L -stable and the error constant is small. For various values of 7 in this range, we 
determined (by a nonlinear Gauss-Newton code) c’ 2 and C 3 in order to minimize 
the fifth-order error terms. It turned out that 

C2=0.5, C 3 = 0.3 

is close to optimal. With this we coded two different choices of 7 : 7 = 4/15 = 
0.2666..., which was numerically the better choice and 7 = 1 / 4 , which gave, via 
Formulas (6.4), (6.5), ( 6 . 6 ) and (6.7), especially nice rational coefficients. These 
latter are displayed in Table 6.5. We have included a continuous solution to this 
method 

5 

y(x 0 + Oh) &y 0 +h^2 

3 =1 

which is third order for 0 < 6 < 1 and updates to the fourth order approximation 
y 1 for 6 = 1. 
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Table 6.5. L-stable SDIRK method of order 4 



(6.16) 


(6.17) 


Exercises 


1. (Crouzeix & Raviart 1980). Compute the SDIRK methods (Table 6.1) for 5 = 
3, p = 4. Obtain also (for 5 = 2,p = 3) once again the method of Table 7.2, 
Sect. II.7. 

Result. The last order condition is in both cases just a polynomial in 7 . Among 
the different solutions, the following presents an A-stable scheme: 



2. Verify all details of Tables 6.1 and 6.2. 
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3. The four cases of A-stable SDIRK methods of order p = s + 1 indicated in 
Table 6.3 (right) are the only ones existing. This fact has not yet been rigorously 
proved, because the “proof” given in Wanner, Hairer & Nprsett (1978) uses an 
asymptotic formula without error estimation. Do better. 

4. Cooper & Sayfy (1979) have derived many DIRK (which they call “semi- 
explicit”) methods of high order. Their main aim was to minimize the number 
of implicit stages and not to maximize stability. One of their methods is 


q-Vq 

6-V6 





10 

10 





6+9V6 

-6+5\/6 

6-V6 




35 

14 

10 




1 

888+607\/6 

126-161\/6 

6 —\/6 



2850 

1425 

10 



4-V6 

3153—3082\/6 

3213+1148 v^6 

-267+88^6 

6-V6 


10 

14250 

28500 

500 

10 


4+V6 

—32583 +14638 V6 

-17199+364\/6 

1329-544^6 

—96+131^6 

6-V6 

10 

71250 

142500 

2500 

625 

10 

1 

0 

0 

1 

9 

16-V6 

36 

16+a/6 0 

36 U 


Show that it is of order 5 and A-stable, but not L -stable. 


5. It can be seen in Table 6.4 that for s = 2,4, 6 , and 8 the L -stability supercon¬ 
vergence point coincides with the right end of the A-stability interval. Explain 
this with the help of order star theory (Fig. 6.3.a). 

Further, for 5 = 7, a superconvergence point is given by 7 = 0.20406693, 
which misses the A-stability interval given there by less than 2 • 10~ 5 . Should 
the above argument also apply here and must there be a computation error 
somewhere? Study the corresponding order star to show that this is not the 
case (Fig. 6.3.b). 



Fig. 6.3.a. Fig. 6.3.b. 

Multiple pole order star Multiple pole order star 

s = 8 , 7 = 0.23437316 5 = 7, 7 = 0.20406693 
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When the functions ip are non-linear, implicit equations can in 
general be solved only by iteration. This is a severe drawback, 
as it adds to the problem of stability, that of convergence of the 
iterative process. An alternative, which avoids this difficulty, is 

(H.H. Rosenbrock 1962/63) 


... is discussed in this section. Among the methods which already give satisfac¬ 
tory results for stiff equations, Rosenbrock methods are the easiest to program. We 
shall describe their theory in this section, which will lead us to our first “stiff” code. 
Rosenbrock methods belong to a large class of methods which try to avoid nonlin¬ 
ear systems and replace them by a sequence of linear systems. We therefore call 
these methods linearly implicit Runge-Kutta methods. In the literature such meth¬ 
ods are often called “semi-implicit” (or was it “semi-explicit”?), or “generalized” 
or “modified” or “adaptive” or “additive” Runge-Kutta methods. 


Derivation of the Method 


We start, say, with a diagonally implicit Runge-Kutta method 

k i = h f ( Vo + a ij k i +a ii k i) i = 1, - - - , 5 
V j =1 / 

5 

Vi =Vo + ^2 b i k i 

i— 1 

applied to the autonomous differential equation 

*/=/(*/)• 


(7.1) 


(7.2) 


The main idea is to linearize Formula (7.1). This yields 

k i = h f(9 i ) + hf'(g l )a ii k i 
i — 1 

9i = yo + '52 a ij k j, 

3 = 1 


(7.3) 


and can be interpreted as the application of one Newton iteration to each stage 
in (7.1) with starting values = 0. Instead of continuing the iterations until 
convergence, we consider (7.3) as a new class of methods and investigate anew its 
order and stability properties. 
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Important computational advantage is obtained by replacing the Jacobians 
/'(#•) by J = f'(y 0 ), so that the method requires its calculation only once (Cala- 
han 1968). Many methods of this type and much numerical experience with them 
have been obtained by van der Houwen (1973), Cash (1976) and Nprsett (1975). 

We gain further freedom by introducing additional linear combinations of the 
terms Jk- into (7.3) (Nprsett & Wolfbrandt 1979, Kaps & Rentrop 1979). We then 
arrive at the following class of methods: 


Definition 7.1. An 3 -stage Rosenbrock method is given by the formulas 

K = h f[yo + J 2 a ij k j) +hJ * = s 

' j =1 ' j =1 

s 

yi = yo + J2 b j k j 

3 =1 

where OL i -,i i -,b i are the determining coefficients and J = f'(y 0 ). 


(7.4) 


Each stage of this method consists of a system of linear equations with un¬ 
knowns £;• and with matrix I — . Of special interest are methods for which 

7 n = ... = 7 SS = 7 , so that we need only one LU-decomposition per step. 


Non-autonomous problems. The equation 

2 / = f(x, y) (7.2a) 

can be converted to autonomous form by adding x f = 1. If method (7.4) is applied 

to the augmented system, the components corresponding to the x -variable can be 
computed explicitly and we arrive at 

J ^ Q j- ^ y i 

K = hf(x o + a { h, y 0 + o^-fy) +7 ih 2 -£(x 0 ,y 0 ) + hMx 0 ,y 0 )£ 

j= 1 y J = l 

s 

Vi =yo + ^2 b j k j’ (7.4a) 

3 = 1 

where the additional coefficients are given by 

i — l i 

^ = ( 7 - 5 ) 

3= 1 3= 1 


Implicit differential equations. Suppose the problem is of the form 

My’ = f(x,y) (7.2b) 

where M is a constant matrix (nonsingular for the moment). If we formally multi¬ 
ply (7.2b) with M -1 , apply method (7.4a), and then multiply the resulting formula 
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with M , we obtain 

Mfc; = /»/(*„ + a { h, 2/o + XI a u fc J') + O’Vo) + h -£( x o^o) X ^H k j 

j=i y j =i 

S 

2 /i=% + X & i fc r ( 7 - 4 b) 

j'=i 

An advantage of this formulation is that the inversion of M is avoided and that 
possible band-structures of the matrices M and df /dy are preserved. 


Order Conditions 


Conditions on the free parameters which ensure that the method is of order p , i.e., 
the local error satisfies 


y(x 0 + h)-y 1 =0(h p+1 ), 

can be obtained either by straightforward differentiation or by the use of the theo¬ 
rems on B -series (Sect. 11.12). We follow here the first approach, since it requires 
only the knowledge of Sect. II.2. The second possibility is sketched in Exercise 2. 
As in Sect. II.2, we write the system (7.2) in tensor notation and Method (7.4) 

as 1 

k j = h f J (dj) + h X fi<(y o) X ijk k k 

K k 

9i = Vo + X a ij k j’ (7.4’) 

j 

yi=yo+J2 b J k j- 

3 

Again, we use Leibniz’s rule (cf. (II.2.4)) 

K k 

and have from the chain rule (cf. Sect. II.2, (2.6; 1), (2.6;2)) 

K 

(/ J (<?,))" x f J KL(<Jj) • (gfy ■ (gfy + x ■ (gfr 

K,L K 


1 In the sequel, the reader will find many k 's of different meaning; on the one hand 
the “fc” in Formula (7.1) which goes back to Runge and Kutta, on the other hand “fc” as 
summation index as since ever in numerical analysis. Although this looks somewhat strange 
in certain formulas, we prefer to retain the notation of previous sections. 
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etc. Inserting this into (7.6) we obtain recursively 


(*/) (0) L,o 

(fc/) <2> L =0 

=0 

=f J 

= 2 E fif K E ■+' 2 E tif K E ^ 

K k K k 

(7.7;0) 

(7.7;l) 


= 2 E^ K EKfc+^) 

K k 

(7.7;2) 

(*/) (s| L.„ 

= 3 E ^Kl/ A f L E a jk a jl 

K,L k,l 

+ 3-2E/k-/l f L y^i a ik+'fik)( a ki+'fki) 

K,L k,l 

(7.7;3) 

etc. All elementary differentials are evaluated at y 0 . Comparing the derivatives of 
the numerical solution (q > 1) 


j 

(7.8) 

with those of the true solution (Sect. II.2, Formula (2.7;1), (2.7;2), (2.7;3)), we 
arrive at the following conditions for order three: 

•j 



r 

bj{ a jk ^Ijk) — 2 


V 

E^Vrl 



E + ^ki) = 


The only difference with the order conditions for Runge-Kutta methods is that at 
singly-branched vertices of the corresponding trees a- k is replaced by a- k + ^- k . 


In order to arrive at a general result, the formulas obtained motivate the following 
definition: 

Definition 7.2. Let t be a labelled tree of order q with root j ; we denote by 

= E 

the sum over the remaining q — 1 indices k : Z,... etc. The summand ■ k ^ is a 
product of q — 1 factors, which are 

a kl + if l is the only son of k ; 

a kl if l is a son of k and k has at least two sons. 
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Using the recursive representation of trees (Def. II.2.12) we have $^(r) = 1 
for the only tree of order 1 and, as in (II.2.19), 

E if * = [<!,..., tj, 

m> 2 

l) ift = [t 1 }. 

k 

(7.9) 

Theorem 7.3. The derivatives of kj , given by (7.4') } satisfy 

( fc /) (?) L=o= E (7-7;q) 

t£LT q 

and the numerical solution y{ satisfies 

(:yi J ) (,) L=o= E 7(<)E^/‘) fJ W(»o). (7-10) 

t£LT q j 

where F J (t) are the elementary differentials (Definition II.2.3). 

Proof Because of (7.8) we only have to prove the first formula. This is done by 
induction on q and follows exactly the lines of the proof of Theorem II.2.11. We 
use (7.6), replace the expression f J {gjf q ~ 1 ^ by Faa di Bruno’s formula (Lemma 
II.2.8), use 

(sf) (4) =E“>*(*f) (4) 

k 

for the derivatives of g- and insert the induction hypothesis (7.7; 5 ) with 5 < q — 1. 
This gives 

( fc /) (9) |ft=0 = 9 E E ••• E 7 (*i)--- 7(*m) 

u^zLSq ti £.LT$ 1 < m €LT 5m 

’ ^*1 (^l) • ' • y! (^m) 

fci A: m 

+ 9 E '7(*i)E7>t $ t( < i)E^( !, o) i!,ii (*i)(yo)- 

ti£LTq—i k K 

The one-to-one correspondence between the summation set 

..., t m )\u G I/S^, tj G LT^.} and LT q together with the recursion formu¬ 
las (7.9), (II.2.17), (II.2.18) now yields the result. □ 

Comparing Theorems 7.3 and II.2.6 we obtain: 
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Table 7.1. Trees and order conditions up to order 5 


e(t) 

t 

graph 

7(0 

*;(*> 


1 

T 

•j 

1 

1 

1 

2 

hi 

/ 

2 


1/2-7 

3 

hi 

V , 

3 

T,k,l a jk a jl 

1/3 


£32 

J* 

6 

YskjPjkPkl 

1/6 — 7 + 7 2 

4 

£41 


4 

Yk,l,m a jk a jl cx jm 

1/4 


£42 


8 

Y^fk,l,m ® jkfikl® jm 

1 / 8 - 7/3 


£43 


12 

Yk,l,m Pjk a kl a km 

1 / 12 - 7/3 


£44 

* 

24 

Ylk,l,m PjkPklPlm 

1/24 — 7/2 + 3 T 2 /2 — 7 3 

5 

£51 

n in l 1. 

5 

Y a jk a jl a jm® jp 

1/5 


£52 


10 

Y a 3kPkl<Xjm<Xjp 

1 

O 


£53 


15 

Y^jk^kl^km^jp 

1/15 


£54 

p 4 k 

30 

Y ® jkfiklPlm® jp 

1/30 — 7/4 + 7 2 /3 


£55 

4. 

20 

Y 01 jkfikl® jmfirnp 

1/20- 7/4 +7 2 /3 


£56 

•/ 

20 

Y v fijk®kl a km a kp 

1 

O 


£57 

‘y p 

*j 

40 

Y Ajk^klfllm^kp 

1/40-57/24+ 7 2 /3 


^58 

>1* 

60 

Y fijkfikl a lm a lp 

1/60- 7 / 6 + 7 2 /3 


£59 

> 

120 

Y) fijkfiklfilmfimp 

1 / 120 - 7/6 + 7 2 - 27 3 + 7 4 


Theorem 7.4. A Rosenbrock method (7.4) with J = f'(y Q ) is of order p iff 

= ^ 7 ) f° r eW^p- ( 7 - n ) 

j y } □ 

The expressions $ •(£) simplify, if we introduce the abbreviation 

Pi}= a ij+'Vij- ( 7 - 12 ) 

The order conditions (7.11) for all trees up to order 5 are given in Table 7.1. 

A further simplification of the order conditions (7.11) is possible if 

7 -- = 7 for all i (7.13) 

(It is unfortunate that in the current literature the letter 7 is used for the parameter 
in (7.4) as well as for 7 (t) in (7.11) and we hope that no confusion will arise). In 
the same way as for DIRK methods, the summations in the expressions for <$ ■(£) 
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in the 5th column of Table 7.1 again contain more terms than the corresponding ex¬ 
pressions for explicit Runge-Kutta methods, since the matrix 7 - (and hence (3 -) 
contains non-zero elements in the diagonal. The difference is that here these diag¬ 
onal 7 appear only for singly-branched vertices (see Definition 7.2). Therefore the 
procedure explained in Sect. IV .6 (see Formulas (6.1’) and (6.1”) must be slightly 
modified and leads to order conditions of the form 

(7.11’) 

3 

where the polynomials p t ( 7 ) are listed in the last column of Table 7.1. 


The Stability Function 

If we apply Method (7.4) to the test equation y f = A y and if we assume J = 
f f (y 0 ) = A then the numerical solution becomes y x = R(h\)y 0 with 

R{z) = 1 + zb T {I - zB )- 1 11 (7.14) 

where we have used the notation b T = (b x ,..., b s ) and B = {(3 ^)? - =1 . Since B 
is a lower triangular matrix, the stability function (7.14) is equal to that of a DIRK- 
method with RK-matrix B . Properties of such stability functions have already 
been investigated in Sect. IV. 6 . 


Construction of Methods of Order 4 


In order to construct 4-stage Rosenbrock methods of order 4 we list, for conve¬ 
nience, the whole set of order conditions (c.f. Table 7.1.). 


• 

b} b 2 b 3 b^ — 1 

(7.15a) 

/ 

^2^2 + ^3^3 + ^4^4 = 2 ~ ^ ~ ^ 21 ^ 

(7.15b) 

V 

t>2 a 2 + ^3 a 3 + ^4 a 4 = 0 

(7.15c) 

> 

M 32 #! + h {P 42 P 2 + PazP'z) = £ - 7 + 7 2 = ^ 32 ( 7 ) 

(7.15d) 


b 2 a\ + 63 O 3 + b A a\ — - 

(7.15e) 

•> 

b 3 a s a 32 {32 + b 4 a 4 (a 42 f3 2 + o 43 /7 3 ) = - — — = p 42 (7) 

(7.15f) 

y 

• 

^3^32 a 2 + ^4(/^42 a 2 + /^43 a 3) = 12 ~ 3 ^ = 

(7.15g) 


S ~ -1 + b 2 ~ 73 = 


(7.15h) 
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Here we have used the abbreviations 

z—l i —1 

a i = Y a ip Pi = Y Piy (1A6) 

j= 1 j= 1 

For the sake of step size control we also look for an embedded formula (Wolfbrandt 
1977, Kaps & Rentrop 1979) 


yi = yo + Y*j k j 

3 =1 

which uses the same kj -values as (7.4), but has different weights. This method 
should have^order 3, i.e., the four conditions (7.15a)-(7.15d) should be satisfied 
also for the b •. These equations constitute the linear system 


n l l 

o P ' 2 Pi 

o a\ a 2 

Vo 0 P 32 P 2 


1 

Pi 

0,2 


\ 


£'/W 


(h\ 

b 2 

b 3 

\b 4 J 


1/2-7 

1/3 

V1/6 — 7 + 7 2 


(7.18) 


Whenever the matrix in (7.18) is regular, uniqueness of the solutions of the linear 
system implies b i = b i (i — 1,..., 4) and the approximation y x cannot be used for 
step size control. We therefore have to require that the matrix (7.18) be singular, 
i.e., 

3 

{p' 2 al-p' A a 2 2 )p 32 p' 2 = (p l 2 a 2 3 -p l 3 a 2 2 )Y,p4 j P' j - (7-19) 

3 = 2 


This condition guarantees the existence of a 3rd order embedded method (7.17), 
whenever (7.15) possesses a solution. The computation of the coefficients a-, 
(3- , 7 , 6 - satisfying (7.15), (7.16) and (7.19) is now done in the following steps: 


Step 1. Choose 7 > 0 such that the stability function (7.14) has desirable stability 
properties (c.f. Table 6.3). 

Step 2. Choose a 2 , a 3 , a 4 and b x , b 2 , 6 3 , 6 4 in such a way that the three conditions 
(7.15a), (7.15c), (7.15e) are fulfilled. One obviously has four degrees of freedom 
in this choice. Observe that the ( 6 -, a { ) need not be the coefficients of a standard 
quadrature formula, since b i a i — 1/2 need not be satisfied. 

Step 3. Take /? 43 as a free parameter and compute (3 32 /3 2 from (7.15h), then 
{/3 A2 /3 2 + ^ 43 /^ 3 ) from (7.15d). These expressions, inserted into (7.19) yield a sec¬ 
ond relation between (the first one is (7.15b)). Eliminating (fe 4 /3 42 + 

b 3 P 32 ) from(7.15d) and (7.15g) gives 

^4^43(/^2 a 3 “ 03 a 2 ) ~ ^2^43 (7) — ^2^32(7)5 

a third linear relation for P 2 ,P 3 ,p f A . The resulting linear system is regular iff 
6 4/ ? 43 a 27(37- 1 )^ 0 - 
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Step 4. Once the 3[ are known we can find /?o 9 and from the values of (3*0 (3i >, 
(/? 42 /3 2 + ^ 43 /^ 3 ) obtained in Step 3. 

Step 5. Choose a 32 , a 42 , a 43 according to (7.15f). One has two degrees of freedom 
to do this. Finally, the values a •, /?■ yield a ix , f3 {1 via condition (7.16). 


Table 7.2 Rosenbrock methods of order 4 


method 

7 

parameter choices 

A(a)- 

stable 

|i?(oo)| 

GRK4A 

(Kaps-Rentrop 79) 

0.395 

a 2 = 0.438, a 3 = 0.87 

64 = 0.25 

7t/2 

0.995 

GRK4T 

(Kaps-Rentrop 79) 

0.231 

a 2 = 2 Tl (7.22), 63 = 0 

89.3° 

0.454 

Shampine (1982) 

0.5 

a 2 = 2 T , (7.22), 63 = 0 

7t/2 

1/3 

Veldhuizen (1984) 

0.225708 

<*2 =2t, (7.22), 63 =0 

89.5° 

0.24 

Veldhuizen (1984) 

0.5 

a 2 = 27 , a 3 = 0.5, 63 = 0 

7t/2 

1/3 

L -stable method ■ 

0.572816 

a 2 = 2y, (7.22), 63 = 0 

7t/2 

0 


Most of the popular Rosenbrock methods are special cases of this construction 
(see Table 7.2). Usually the remaining free parameters are chosen as follows: if we 
require 

a 43 = a 42 = a 32 an ^ a 41 = a 31 (7.20) 

then the argument of / in (7.4) is the same for i = 3 and i = 4. Hence, the 
number of function evaluations is reduced by one. Further free parameters can be 
determined so that several order conditions of order five are satisfied. Multiplying 
the condition (7.15g) with a 2 and subtracting it from the order condition for the 
tree t 56 yields 

^4^43 a 3( a 3 ~ a 2 ) ~ .P56(7) _ ^27*43(7)* (7.21) 


This determines /? 43 . The order condition for ^ 51 can also easily be fulfilled in 
Step 2. If a 3 = a 4 (see (7.20)) this leads to the restriction 


a 


3 


1/5-q 2 /4 
1/4 — a 2 /3 


(7.22) 


In Table 7.2 we collect some well-known methods. All of them satisfy (7.20) and 
(7.21) (Only exception: the second method of van Veldhuizen for 7 = 0.5 has 
/? 43 = 0 instead of (7.21)). The definition of the remaining free parameters is given 
in the first two columns. The last columns indicate some properties of the stability 
function. 
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Higher Order Methods 

As for explicit Runge-Kutta methods the construction of higher order methods is 
facilitated by the use of simplifying assumptions. First, the condition 

s 

j = !»•••.« {1.23) 

i=j 

plays a role similar to that of (II. 1.12) for explicit Runge-Kutta methods. It implies 
that the order condition of the left-hand tree in Fig. 7.1 is a consequence of the two 
on the right-hand side. A difference to Runge-Kutta methods is that here the vertex 
directly above the root has to be multiply-branched. 

The second type of simplifying asumption is (with j3 k = \ Pkd 

j — 1 2 

= J = (7.24) 

k=1 

It has an effect similar to that of (II.5.7). As a consequence of (7.24) the order 
conditions of the two trees in Fig. 7.2 are equivalent. Again the vertex marked by 
an arrow has to be multiply-branched. 

The use of the above simplifying assumptions has been exploited by Kaps & 
Wanner (1981) for their construction of methods up to order 6 . Still higher or¬ 
der methods would need generalizations of the above simplifying assumptions (in 
analogy to C(ry) and D(() of Sect. II.7). 



Fig. 7.1. Reduction with (7.23) Fig. 7.2. Reduction with (7.24) 

Implementation of Rosenbrock-Type Methods 

A direct implementation of (7.4) requires, at each stage, the solution of a linear 
system with the matrix I — h^ i{ J and also the matrix-vector multiplication J • 
S • The latter can be avoided by the introduction of the new variables 

i 

U i = Yu^H k 3' * = 1 . 

3 -1 

If 7 .. 0 for all i , the matrix T = ( 7 ^) is invertible and the k i can be recovered 

from the u •: 

1 

k i = — u i~Yl c a u i ’ C = dia g(7n > ■ ■ •, 77s 1 ) - r _1 . 

la j=1 
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Inserting this formula into (7.4) and dividing by h yields 

(h^‘- j h = f ( v ° + £“«“>) + £(?)“<• 

111 j =i j =i 

3 

yi = y 0 + Y, m j u ^ 

j= i 


i = l,...,s 


(7.25) 


where 

i a ij) = (%) r , ( TO i»• • •> m s) = (h , • • •, ^)r _1 . 

Compared to (7.4) the formulation (7.25) of a Rosenbrock method avoids not only 
the above mentioned matrix-vector multiplication, but also the n 2 multiplications 
for ( / y ii h)J. Similar transformations were first proposed by Wolfbrandt (1977), 
Kaps & Wanner (1981) and Shampine (1982). The formulation (7.25) can be found 
in Kaps, Poon & Bui (1985). 


For non-autonomous problems this transformation yields 

( i- 1 - -£( x o’Vo)) u i =/(*„ + <*A y 0 + J2 a ij u i) 


with a • and 7 - given by (7.5). 


3= 1 


i—i 


(7.26) 


i=i 


For implicit differential equations of the form (7.2b) the transformed Rosenbrock 
method becomes 


(/^l M _ ^ ( * o,yo) ) Ui =f{ x ° +a i h ’yo+lL a ij u j) 


dy 


j= 1 


i —1 


+ M E(x)^ + 7 i?l S (a:o ’ 2/o) - 

i=i 


(7.27) 


Coding. Rosenbrock methods are nearly as simple to implement as explicit Runge- 
Kutta methods. The only difference is that at each step the Jacobian df /dy has to 
be evaluated and 5 linear systems have to be solved. Thus, one can take an explicit 
RK code (say DOPRI5), add four lines which compute df/dy by finite differences 
(or call a user-supplied subroutine JAC which furnishes it analytically); add further 
a call to a Gaussian DEComposition routine, and add to each evaluation-stage a 
call to a linear SOLver. Since the method is of order 4(3), the step size prediction 
formula 

h new = h • min| 6 ., maxU).2, 0.9 • (Tol/err) 1 / 4 ') j (7.28) 

seems appropriate. 
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However, we want the code to work economically for non-autonomous prob¬ 
lems as well as for implicit equations. Further, if the dimension of the system 
is large, it becomes crucial that the linear algebra be done, whenever possible, 
in banded form. All these possibilities, autonomous or not, implicit or explicit, 
df/dy banded or not, B banded or not, df/dy analytic or not, (“... that is the 
question”) lead to 2 5 different cases, for each of which the code contains special 
parts for high efficiency. Needless to say, it works well on all stiff problems of 
Sect. IV. 1. A more thorough comparison and testing will be given in Sect. IV. 10. 


The “Hump” 


On some very stiff equations, however, the code shows a curious behaviour: con¬ 
sider the van der Pol equation in singular perturbation form (1.5’) with 

£ = 10- 6 , j/ 1 (0) = 2, y 2 (0) = -0.66. (7.29) 

We further select method GRK4T (Table 7.2; each other method there behaves sim¬ 
ilarly) and Tol = 7 • 10 -5 . Fig. 7.3 shows the numerical solution y 1 as well as the 
step sizes chosen by the code. There all rejected steps are indicated by an x . 




Curious step size drops (by a factor of about 10~ 3 ) occur without any apparent 
exterior reason. Further, these drops are accompanied by a huge number of step 
rejections (up to 20). In order to understand this phenomenon, we present in the 
left picture of Fig. 7.4 the exact local error as well as the estimated local error 
\\yi ~~Vi II at x = 0.55139 as a function of the step size h (both in logarithmic 
scale). The current step size is marked by large symbols. The error behaves like 
C ■ h 5 only for very small h (< 10 -6 = e ). Between h = 10 -5 and the step size 
actually used (« 10 -2 ) the error is more or less constant. Whenever this constant 
is larger than Tol (horizontal broken line), the code is forced to decrease the step 
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io -6 10 " 4 lo - 2 L 

L-stable method 

. . io -4 

— 

jf\ io - 6 

if exact local error 
— est. local error \j 

- 



Fig. 7.4. Study of local error for (1.5’) at x = 0.55139 


size until h&e. Asa first remedy, we accelerate this lengthy process, as Shampine 
(1982) also did, by more drastical step size reductions ( h new = ft/10) after each 
second consecutive step rejection. It also turns out (see right picture of Fig. 7.4) 
that the effect disappears in the neighbourhood of the actual step size for the Te¬ 
stable method (where 7?(oo) = 0). Methods with R(oo) — 0 and also R( oo) = 0 
have been derived by Kaps & Ostermann (1990). 

A more thorough understanding of these phenomena is possible by the consid¬ 
eration of singular perturbation problems (Chapter VI). 


Methods with Inexact Jacobian (W -Methods) 

The relevant question is now, what is the cheapest type of impli¬ 
citness we have to require. (Steihaug & Wolfbrandt 1979) 

All the above theory is built on the assumption that J is the exact Jacobian df jdy. 
This implies that the matrix must be evaluated at every step, which can make the 
computations costly. The following attempt, due to Steihaug & Wolfbrandt (1979), 
searches for order conditions which assure classical order for all approximations 
A of df/dy. The latter is then maintained over several steps and is just used to 
assure stability. The derivation of the order conditions must now be done somewhat 
differently: if J is replaced by an arbitrary matrix A, Formula (7.6) becomes 

( fc /) W L= 0 =9(/ J (^)) (?_1) U=o + 9E^E^(fcf) (9_1) U=o (7.30) 

K k 

where A = (A^)j K=1 , and we obtain 

(fc/) (2) L 0 = 2 E tif K E + 2 E A if K E 'lik- (7-3i;2) 

K k K k 

Inserted into (7.8), the first term must equal the derivative of the exact solution and 
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the second must be zero. Similarly, we obtain instead of (7.7;3) 


a- h a 


jk u jl 


K,L 


k,l 


+ 3 • 2 E fh'fi f L E a jk a ki +3-2 E /k a l f L E 


(7.31;3) 


^ jk'lkl 


K,L 


k,l 


K,L 


k,l 


+ 3 • 2 E A Kflf L E ^ 3 k a kl + 3 • 2 E A K A L f L E 1 Jklkl 


K,L k,l 

and the order conditions for order three become 


K,L 


k,l 


•j 

E‘i-i 

4‘ 

Ew = 1 / 2 

/ 

E b fhk = 0 

/ v ' 

E 6 j a i‘ a it = 1 /3 


53 h j a jk a kl = - 1 / 6 

I J 

E b 3 a jklkl = o 


E b Pih a ki = ° 

> 

E b 3 ^kiki = o. 


(7.32) 


For a graphical representation of the elementary differentials in (7.3 l;q) and of the 
order conditions (7.32) we need trees with two different kinds of vertices (one rep¬ 
resenting / and the other A). As in Sect. 11.15 we use “meagre” and “fat” vertices 
(see Definitions 11.15.1 to II. 15.4). Not all trees with meagre and fat vertices (P- 
trees) have to be considered. From the above derivation we see that fat vertices 
have to be singly-branched (derivatives of the constant matrix A are zero) and that 
they cannot be at the end of a branch. We therefore use the notation 

TW = { P-trees ; end-vertices are meagre and 

fat vertices are singly-branched } \ ) 

and if the vertices are labelled monotonically, we write LTW . 


Definition 7.5. The elementary differentials for trees t £ TW are defined recur¬ 
sively by F J (r)(y) = f J (y) and 


F J (t)(y) 


E fk . kJv)■ 

A i,..., m 

ift = a [t 1 ,...,tm\ (meagre root) 
E A k ' F K (*i)(y) ifi = 6^i] (fat root). 
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Definition 7.6. For t G TW we let ^(t) = 1 and 

if^ = 6 [*i] - 

k 

We remark that T (the set of trees as considered for Runge-Kutta methods) 
is a subset of TW and that the above definitions coincide with Definitions II.2.3 
and II.2.9 (c.f. also Formulas (II.2.18) and (II.2.19)). The general result is now the 
following 

Theorem 7.7. A W -method (7.4) with J — A arbitrary is of order p iff 
^ = — |-j- for t eT with g(t) < p, and 

yy b ■ $y(£) = 0 for t G TW \ T with g(t) < p . 

3 

The proof is essentially the same as for Theorems 7.3 and 7.4. □ 


*>(*) - 


Table 7.3. Number of order conditions for W-methods 


order p 

1 

2 

3 

4 

5 

6 

7 

8 

no. of conditions 

_i 

1 

3 

8 

21 

58 

166 

498 

1540 


The number of order conditions for W -methods is rather large (see Table 7.3), 
since each tree of T with n singly-branched vertices gives rise to 2 K order condi¬ 
tions (in the case of symmetry some may be identical). Therefore, W -methods of 
higher order are best obtained by extrapolation (see Sect. IV.9). 

The stability investigation for linearly implicit methods with A ^ f f {y 0 ) is 
very complicated. If we linearize the differential equation (as in the beginning of 
Sect. IV.2) and assume the Jacobian to be constant, we arrive at a recursion of the 
form 

y 1 = R(hf'(y 0 ),hA)y 0 . 

Since, in general, the matrices f'(y 0 ) and A cannot be diagonalized simultane¬ 
ously, the consideration of scalar test equations is not justified. Stability investiga¬ 
tions for the case when \\f f (y 0 ) — A\\ is small will be considered in Sect. IV.l 1. 
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Exercises 

1. (Kaps 1977). There exists no Rosenbrock method (7.4) with 5 = 4 and p — 5. 
Prove this. 

2. (Nprsett & Wolfbrandt 1979). Generalize the derivation of order conditions 
for Runge-Kutta methods with the help of B-series (Sect.II.il, page 247) to 
Rosenbrock methods. 

Hint. Prove that, for a B-series i?(a, y 0 ) with a: T — > R satisfying a(0) = 0, 

hf(y 0 )B{a.,y 0 )=B('k, y 0 ) 

is again a B-series with coefficients 

= f e(<)a(<i) if ^ = [<i] 

V 10 else. 

3. Cooper & Sayfy (1983) consider additive Runge-Kutta methods 

2 — 1 2 

9i = y 0 + h ^ o+ c j h ’ + hJ ^2 Vij 9 } * = i, • • •, 5 +1 

j =1 j= 1 

yi=9 s +i ( 7 - 34 ) 

whose coefficients satisfy J2]=\ a ij = c n H)= 1 Vij = 0 • 

a) Prove that (7.34) is equivalent to (7.4) whenever a s+1 i = b i and 

( r iij)( a ij) = ( a > i)ha)- ( 7 - 35 ) 

Here all matrices are of dimension (s + l)x(s + l). The last line of ( 7 ^) 
need not be specified since the last column of (a^-) is zero. 

b) If the coefficients of (7.34) satisfy a • •_ 1 ^ 0 for all i , then we can always 
find an equivalent method of type (7.4). 

4. (Verwer 1980, Verwer & Scholz 1983). Derive order conditions for Rosen¬ 
brock methods “with time-lagged Jacobian”, i.e., methods of type (7.4) where 
J is assumed to be f f (y(x 0 —toh)). If to is the step ratio h old /h, this allows 
re-use of the Jacobian of the previous step. 

5. (Kaps & Ostermann 1989). Show that some order conditions of (7.32) can be 
shifted to higher orders if it is assumed that 

f(y 0 )-J = O(h). 

This makes the conditions of Exercise 4 independent of lo . 

Result. The number of order-shifts is equal to the number of fat nodes. 
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These have not been used to any great extent ... 

(S.P. Ndrsett 1976) 

However, the implementation difficulties of these methods have 
precluded their general use; ... (J.M. Varah 1979) 

Although Runge-Kutta methods present an attractive alternative, 
especially for stiff problems, ... it is generally believed that they 
will never be competitive with multistep methods. 

(K. Burrage, J.C. Butcher & F.H. Chipman 1980) 

Runge-Kutta methods for stiff problems, we are just beginning to 
explore them ... (L. Shampine in Aiken 1985) 

If the dimension of the differential equation y f = f{x,y) is n, then the 5-stage 
fully implicit Runge-Kutta method (3.1) involves a n • 5 -dimensional nonlinear 
system for the unknowns g 1 ,..., g s . An efficient solution of this system is the 
main problem in the implementation of an implicit Runge-Kutta method. 

Among the methods discussed in Sect. IV.5, the processes Radau IIA of Ehle, 
which are L -stable and of high order, seem to be particularly promising. Most of 
the questions arising (starting values and stopping criteria for the simplified Newton 
iterations, efficient solution of the linear systems, and the selection of the step sizes) 
are discussed here for the particular Ehle method with 5 = 3 and p = 5. This then 
constitutes a description of the code RADAU5 of the appendix. An adaptation of the 
described techniques to other fully implicit Runge-Kutta methods is more or less 
straight-forward, if the Runge-Kutta matrix has at least one real eigenvalue. We 
also describe briefly our implementation of the diagonal implicit method SDIRK4 
(Formula (6.16)). 


Reformulation of the Nonlinear System 

In order to reduce the influence of round-off errors we prefer to work with the 
smaller quantities 

z i=9i-Vo • (8.1) 


Then (3.1a) becomes 

s 

Zi=h'^2a ij f(x 0 +c j h,y 0 + z j) i = (8.2a) 

3- 1 

Whenever the solution z 1 ,... ,z s of the system (8.2a) is known, then (3.1b) is an 
explicit formula for y 1 . A direct application of this requires 5 additional function 
evaluations. These can be avoided, if the matrix A = (a-) of the Runge-Kutta 
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coefficients is nonsingular. Indeed, (8.2a) can be written as 
z i\ ( hf( x o + c-Ji, 2/ 0 + 0 i) \ 


= A 




\hf(x 0 +c s h,y 0 + z s )) 


so that (3.1b) is seen to be equivalent to 

s 

yi = y 0 + Y^ d i 


i= 1 


where 


(d 1 ,...,d s ) = (b 1 ,...,b s )A~ 1 . 


(8.2b) 

(8.3) 


For the 3-stage Radau IIA method (Table 5.6) the vector d is simply (0,0,1), since 
h i = «,z for all i. 

Another advantage of Formula (8.2b) is the following: the quantities z 1 ,... ,z s 
are computed iteratively and are therefore affected by iteration errors. The evalu¬ 
ation of f(x 0 + y 0 + Zj) in Eq. (3.1b) would then, due to the large Lipschitz 
constant of /, amplify these errors, which then “can be disastrously inaccurate for 
a stiff problem” (L.F. Shampine 1980). 


Simplified Newton Iterations 

For a general nonlinear differential equation the system (8.2a) has to be solved iter¬ 
atively. In the stone-age of stiff computation (i.e., before 1967) people were usually 
thinking of simple fixed-point iteration. But this transforms the algorithm into an 
explicit method and destroys the good stability properties. The paper of Liniger & 
Willoughby (1970) then showed the advantages of using Newton’s method for this 
purpose. Newton’s method applied to system (8.2a) needs for each iteration the 
solution of a linear system with matrix 

( I - ha n^j( x o+ c ih,y 0 + z-l) ... -ha ls ^{x 0 + c s h,y 0 + z s ) \ 

-ha sl %{x 0 +c 1 h,y 0 +z l ) ... I - ha ss ^{x 0 + c s h,y 0 + z s ) J 

In order to simplify this, we replace all Jacobians §£(x 0 + c i h,y 0 + z t ) by an 
approximation 

, „df, , 

J ™fy( x O’Vo)- 

Then the simplified Newton iterations for (8.2a) become 

(. I-hA (8) J)AZ k = -Z k + h{A <g> I)F{Z k ) 

Z k+ 1 = Z k + AZ k . 


(8.4) 
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Here Z k — (z k ,..., z k ) T is the k- th approximation to the solution, and A Z k = 
(Az k ,..., A z k ) T are the increments. F(Z k ) is an abbreviation for 

F(Z k ) = (f(x 0 +c 1 h,y 0 + z \),... ,/(x 0 + c 3 h,y 0 + z 3 )) . 

Each iteration requires 5 evaluations of / and the solution of a n-s -dimensional 
linear system. The matrix (I — hA(& J) is the same for all iterations. Its LU- 
decomposition is done only once and is usually very costly. 

Starting Values for the Newton Iteration. A natural and simple choice for the 
starting values in the iteration (8.4) (or equivalently (8.13) below), since the exact 
solution of (8.2a) satisfies z • = 0 (h ), would be 

z°i= 0, * = 1 .S. (8.5) 

However, better choices are possible in general. If the implicit Runge-Kutta method 
satisfies the condition C(rj) (see Sections IV.5 andII.7) for some rj <s, then 

z,= y( x o +c i h)-y 0 + O(h r,+1 ). (8.6) 

Suppose now that c • ^ 0 (z = 1,..., «s) and consider the interpolation polynomial 
of degree 5 , defined by 

9 ( 0 ) =0, q (c i ) = z i i = l,...,s. 

Since the interpolation error is of size 0(h s + l ) we obtain together with (8.6) 

y( x o + th ) - Vo - 9(0 = 0(h v+1 ) 

(cf. Theorem 7.10 of Chapter II for collocation methods). We use the values of 
q{t) also beyond the interval [0,1] and take 

2 ° = 9 (l + u>Cj)+ y 0 - 2 / 1 , i = 1,..., 5 , w = h new /h o!d (8.5’) 

as starting values for the Newton iteration in the subsequent step. Numerical ex¬ 
periments with the 3-stage Radau IIA method have shown that (8.5’) usually leads 
to a faster convergence than (8.5). 

Stopping Criterion. This question is closely related to an estimation of the itera¬ 
tion error. Since convergence is linear, we have 

\\AZ k+1 \\ < 0||AZ fc ||, hopefully with 0 < 1. (8.7) 

Applying the triangle inequality to 

Z fc+1 -Z* = (Z fc+1 - Z k + 2 ) + (Z k + 2 - Z fc+3 ) + ... 

(where Z* is the exact solution of (8.2a)) yields the estimate 

-Z*\\<-^-^\\AZ k \\. (8.8) 

The convergence rate 0 can be estimated by the computed quantities 
0 fc = ||AZ*||/||AZ*- 1 ||, fc>l. 


(8.9) 
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It is clear that the iteration error should not be larger than the local discretization 
error, which is usually kept close to Tol. We therefore stop the iteration when 

rj k \\AZ k \\<K-Tol with r, k = (8.10) 

and accept as approximation to Z* . This strategy can only be applied after 

at least two iterations. In order to be able to stop the computations after the first 
iteration already (which is especially advantageous for linear systems) we take for 
k ~ 0 the quantity 

Vo = (ma x{r] 0 id, Uround)) o s 

where rj old is the last rj k of the preceding step. It remains to make a good choice 
for the parameter n in (8.10). To this end we applied the code RADAU5 for many 
different values of k between 10 and 10 -4 and with some different tolerances Tol 
to several differential equations. The observation was that the code works most 
efficiently for values of k around 10 -1 or 10 -2 . 

It is our experience that the code becomes more efficient when we allow a 
relatively high number of iterations (e.g., k max = 7 or 10). During these k max 
iterations, the computations are interrupted and restarted with a smaller stepsize 
(for example with h:=h/ 2) if one of the following situations occurs 

a) there is a A: with 0*. > 1 (the iteration “diverges”); 

b) for some k , 

kmax k 

-f-Q-\\AZ k \\> K -Tol. (8.11) 

The left-hand expression in (8.11) is a rough estimate of the iteration error to be 
expected after k max — 1 iterations. The norm, used in all these formulas, should 
be the same as the one used for the local error estimator. 

If only one Newton iteration was necessary to satisfy (8.10) or if the last Q k 
was very small, say < 10~ 3 , then we don’t recompute the Jacobian in the next 
step. As a consequence, the Jacobian is computed only once for linear problems 
with constant coefficients (as long as no step rejection occurs). 


The Linear System 

An essential gain of numerical work for the solution of the linear system (8.4) is 
obtained by the following method, introduced independently by Butcher (1976) 
and Bickart (1977), which exploits with much profit the special structure of the 
matrix I — hA <g) J in (8.4). 

The idea is to premultiply (8.4) by {hA)~ l (we suppose here that A is 
invertible) and to transform A -1 to a simple matrix (diagonal, block diagonal, 
triangular or Jordan canonical form) 

T~ 1 A~ 1 T = A. 


( 8 . 12 ) 
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With the transformed variables W k — (T -1 ® I)Z k , the iteration (8.4) becomes 
equivalent to 

(h~ 1 A® I-I® J)AW k = -h- 1 (A ® I)W k + (T - 1 ® J)F((T ® I) W k ) 

W k+1 =W k + AW k . (8.13) 

We also replace Z k and AZ k by W k and AW k in the formulas (8.7)-(8.11) (and 
thereby again save some work). 

For the sequel, we suppose that the matrix A -1 has one real eigenvalue 7 
and one complex conjugate eigenvalue pair a ± if}. This is a typical situation for 
3-stage implicit Runge-Kutta methods such as Radau IIA. With 7 = /i _ 1 7 , a = 
(3 = h~ x P the matrix in (8.13) becomes 

/ 7 1-J 0 0 \ 

0 al-J -pi (8.14) 

\ 0 PI al — J J 

so that (8.13) splits into two linear systems of dimension n and 2 n, respectively. 
Several ideas are possible to exploit the special structure of the 2 n x 2 n -submatrix. 
The easiest and numerically most stable way has turned out to be the following: 
transform the real subsystem of dimension 2 n into an n -dimensional, complex 
system 

((ct + iP)I — J)(u + iv) — a + ib (8.14’) 

and apply simple Gaussian elimination. For machines without complex arithmetic, 
one just has to modify the linear algebra routines. Then a complex multiplication 
consists of 4 real multiplications and the amount of work for the solution of (8.14’) 
becomes approximately 4n 3 /3 operations. Thus the total work for system (8.14) is 
about 5n 3 /3 operations. Compared to (3n ) 3 /3, which would be the number of op¬ 
erations necessary for decomposing the untransformed matrix I — hA 0 J in (8.4), 
we gain a factor of about 5 in arithmetical operations. Observe that the transforma¬ 
tions, such as Z k = (T ® I)W k , need only G(n) additions and multiplications. 
The gain is still more drastic for methods with more than 3 stages. 


Transformation to Hessenberg Form. For large systems with a full Jacobian J a 
further gain is possible by transforming J to Hessenberg form 

/* ... * *\ 


S~ 1 JS = H = 


(8.15) 


\ * * / 

This procedure was originally proposed for multistep methods by Enright (1978) 
and extended to the Runge-Kutta case by Varah (1979). With the code ELMHES, 
taken fromLINPACK, this is performed with 2n 3 /3 operations. Because the multi¬ 
plication of S with a vector needs only n 2 /2 operations (observe that S is triangu¬ 
lar) the solution of (8.13) is found in 0(n 2 ) operations, if the Hessenberg matrix 
H is known. This transformation is especially advantageous, if the Jacobian J is 
not changed during several steps. 
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Step Size Selection 

One possibility to select the step sizes is Richardson extrapolation (cf. Sect. II.4). 
We describe here the use of an embedded pair of methods which is easier to pro¬ 
gram and which makes the code more flexible. The following formulas are for the 
special case of the 3-stage Radau IIA methods; the same ideas are applicable to 
all implicit Runge-Kutta methods, whose Runge-Kutta matrix has at least one real 
eigenvalue. 

Embedded Formula. Since our method is of optimal order, it is impossible to 
embed it efficiently into one of still higher order. Therefore we search for a lower 
order method of the form 

3 

Vi =yo + h (t>of( x o,yo) + '52t> i f(x 0 + c i h ,gi)j (8.16) 

i =1 

where g±,g 2 ,g$ are the values obtained from the Radau IIA method and b 0 ^ 0 
(the choice b 0 — 7 0 = 7 -1 , where 7 is the real eigenvalue of the matrix A~ 1 , 
again saves some multiplications). The difference 

3 

£1 - Vi = lo h f{ x o, Vo) + - b i) h f( x o + Ci h ,9i), 

i=l 


which can also be written in the form 

£l-2/l =lo h f( X 0’yo) + e l Z l+ e 2 Z 2+ e 3 Z 3 , (8-17) 

then serves for error estimation. In order that y x — y l — 0(h 4 ) the coefficients 
have to satisfy 

(e 1 ,e 2 ,e 3 ) = ^(-13-7>/6,-13 + 7\/6,-l). (8.18) 

Unfortunately, for y' = Xy and hX — > oo the difference (8.17) behaves like y x — 
y 1 & ^y 0 h\y 0 , which is unbounded and therefore not suitable for stiff equations. 
We propose (an idea of Shampine) to use instead 

err = (I-h 7o J)- 1 (y 1 -y 1 ). (8.19) 

The LU-decomposition of ((/i 7 0 ) -1 / — J) is available anyway from the previous 
work, so that the computation of (8.19) is cheap. For h — > 0 we still have err = 
0(h 4 ), and for h\ —V oo (if y' = A y and J — A) we obtain err —1 . 

This behaviour (for hX oo) is already much better than that for y x — y x , 
but it is not good enough in order to avoid the “hump” phenomenon, described in 
Sect. IV.7. In the first step and after every rejected step for which ||^rr|| > 1, we 
therefore use instead of (8.19) the expression 


err = (I - h~f 0 J) 1 (j 0 hf(x 0 , y 0 + err) + e 1 z 1 + e 2 z 2 + e 3 z 3 ) (8.20) 
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for step size prediction. This requires one additional function evaluation, but satis¬ 
fies err -4 0 for hX -» oo, as does the error of the numerical solution. 


Standard Step Size Controller. Since the expressions (8.19) and (8.20) behave 
like 0(h A ) for h -» 0, the standard step size prediction leads to 

^new =f a C- h o\d-\\ err \r 1/4 - (8-21) 

where 



and sc { = Atol { + max(|y 0l -|, |y l2 |) • Rtol { as in (4.11) of Sect.II.4. Here, the 
safety factor fac is proposed to depend on Newt , the number of Newton iterations 
of the current step and on the maximal number of Newton iterations & max , say, as: 
fac = 0.9 x (2fc max + l)/(2fc max +Newt). 

In order to save LU-decompositions of the matrix (8.14), we also include the 
following strategy: if no Jacobian is recomputed and if the step size fo new , defined 
by (8.21), satisfies 

c i^old — ^new — c 2^old (8.22) 

with, say c x = 1.0 and c 2 = 1.2, then we retain h old for the following step. 


Predictive Controller. The step size prediction by formula (8.21) has the disadvan¬ 
tage that step size reductions by more than the factor fac are not possible without 
step rejections (observe that fo new </ac* implies ||mj| >1). For stiff differ¬ 
ential equations, however, a rapid decrease of the step size is often required (see 
for example the situation of Fig. 8.1, where the step size drops from 10~ 2 to 10 -7 
within a very small time interval). Denoting by err n+1 the error expression (8.19) 
(or (8.20)), computed in the nth step with step size h n , step size predictions are 
typically derived from the asymptotic formula 

\\ err n+l\\ = CnK- ( 8 - 23 ) 

The strategy (8.21) is based on the additional assumption C n+1 & C n , which, as 
we have seen, is not always very realistic. 

A careful control-theoretic study of step size strategies has been undertaken by 
Gustafsson (1994). He came to the conclusion that a better model is to assume that 
log C n is a linear function of n . This means that log C n+1 — log C n is constant 
or, equivalently, 

C n+1 /C n *CJC n _ v (8.24) 


Inserting C n and from (8.23) and C n+1 from 1 = into (8.24) 

yields 

new n y\\err n+1 \\J h n _, V \\err n+1 1|) 


(8.25) 
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In our code RADAU5 we take the minimum of the two step sizes (8.21) and (8.25). 
For the problem considered in Fig. 8.1, this new strategy reduces the number of 
rejected steps from 27 to 7. 

Numerical Study of the Step-Control Mechanism. As a representative example 
we choose the van der Pol equation (1.5’) with e = 10~ 6 , initial values y 1 (0) = 2, 
y 2 (0) = —0.6 and integration interval 0 < x < 2. Fig. 8.1 shows four pictures. 
The first one presents the solution y x {x) with all accepted integration steps for 
Atol = Rtol = 10 -4 . Below this, the step sizes obtained by RADAU5 are plotted as 
function of x . The solid line represents the accepted steps. The rejected steps are 
indicated by x ’s. Observe the very small step sizes which are required in the rapid 
transients between the smooth parts of the solution. The lowest two pictures give 
the number of Newton iterations needed for solving the nonlinear system (8.2a), 
once as function of x , and once as function of the step-number. The last picture 
also indicates the steps where the Jacobian has been recomputed. 

Another numerical experiment (Fig. 8.2) illustrates the quality of the error es¬ 
timates. We applied the code RADAU5 with Atol — Rtol — 10 -4 and initial step 
size h — 10“ 4 to the above problem and plotted at several chosen points of the 
numerical solution 

a) the exact local error (marked by small circles) 

b) the estimates (8.19) and (8.20) (marked by ^ and X respectively) 
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Fig. 8.2. Exact local error and the estimates (8.19) and (8.20) 
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as functions of h . The large symbols indicate the position of the actually used step 
size. Newt is the number of required Newton iterations. 

It is interesting to note that the local error behaves like 0(h 6 ) (straight line of 
slope 6) only for h < e and for large h . Between these regions, the local error grows 
like O(h~ x ) with decreasing h. This is the only region where the error estimate 
(8.20) is significantly better than (8.19). Therefore, we use the more expensive 
estimator (8.20) only in the first and after each rejected step. In any way, both error 
estimators are always above the actual local error, so that the code usually produces 
very precise results. 


Implicit Differential Equations 


Many applications (such as space discretizations of parabolic differential equa¬ 
tions) often lead to systems of the form 

My' = f(x,y ), y(x 0 ) = y 0 (8.26) 


with a constant matrix M. For such problems we formally replace all /’s by 
M~ x f and multiply the resulting equations by M. Formulas (8.13) and (8.19) 
then have to be replaced by 


(h- 1 A <g> M - I® J) A W k = -hr 1 (A ® M)W k + (T -1 <g> I.)F{{T® I)W k ) 

(8.13a) 

err = M — J)~ l (f(x 0 , y 0 ) + (h-y 0 )~ 1 M(e 1 z 1 + e 2 ^ 2 + e 3 ^ 3 )) . 

(8.19a) 

Here the matrix J is again an approximation to df/dy. These formulas may 
even be applied to certain problems (8.26) with singular M (for more details see 
Chapters VI and VII). 

Solving the linear system (8.13a) is done by a decomposition of the matrix (see 
(8.14), (8.14’)) 


/ 7 M-J 0 

\ 0 (a + i/3)M — J 


(8.27) 


If M and J are banded or sparse, the matrices 7 M — J and (a + i/3)M — J re¬ 
main banded or sparse, respectively. The code RADAU5 of the appendix has options 
for banded structures. 
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An SDIRK-Code 


We have also coded, using many of the above ideas, the SDIRK formula (6.16) 
together with the global solution (6.17). For this method also, it was again very 
important to replace the error estimator y 1 — y 1 by (8.19). 

Here, in contrast to fully implicit Runge-Kutta methods, one can treat the stages 
one after the other. Such a serial computation has the advantage that the information 
of the already computed stages can be used for a good choice of the starting values 
for the Newton iterations in the subsequent stages. For example, suppose that 


z i = lhf(x o + 'fh,y 0 + z 1 ) 

z 2 = l h f( x o + c 2 h ,V0+ z 2) + a 21 h f ( x o + iK Vo + z i) 
are already available. Since for all i 


z t = Ci hf(x 0 ,y 0 ) + a ij c j)h 2 (f x + fyf){x 0 ,y 0 ) + 0(h 3 ), 


by solving 

( c i c 2 \ / “i y 
£; «2j ( 'j ) V'l) 

one finds a 1 , a 2 such that 

a l Z l + a 2 Z 2 = Z 3 +0(h 3 ). 




The expression — a 1 z 1 + a 2 z 2 then serves as starting value for the computa- 
tion of z 3 . In the last stage one can take y x , which is then available, for starting 
the Newton iterations for g 3 =y 1 . The computation of z 3 , z 4 , y x , done in this 
way, needs few Newton iterations and a failure of convergence is usually already 
detected in the first stage. 

However, when parallel processors are available, the exploitation of the trian¬ 
gular structure of the Runge-Kutta matrix may be less desirable. Whereas in the 
iteration (8.13) all 5 function evaluations and much of the linear algebra can be 
done in parallel, this is no longer possible for DIRK-methods, when z 1 ,..., z k is 
used in the computations of z k + 1 . 


SIRK-Methods 


The fact that singly-implicit methods have a coefficient matrix 
with a one-point spectrum is the key to reducing the operation 
count for these methods to the level which prevails in linear mul¬ 
tistep methods. 

(J.C. Butcher, K. Burrage & F.H. Chipman 1980) 

In order to avoid the difficulties (in writing a Runge-Kutta code) caused by the com¬ 
plex eigenvalues of the Runge-Kutta matrix A , one may look for methods with real 
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eigenvalues, especially with a single 5 -fold real eigenvalue. Such methods were 
introduced by Nprsett (1976). Burrage (1978) provided them with error estimators, 
and codes in ALGOL and FORTRAN are presented in Butcher, Burrage & Chip- 
man (1980). The basic methods for their code STRIDE are given by the following 
lemma. 


Lemma 8.1. For collocation methods (i.e., for Runge-Kutta methods satisfying 
condition C(s) of Sect. IV.5), we have 


if and only if 


det(7 — zA) = (1 — 7 z) s , 


c i=l x i, Z = 


(8.28) 

(8.29) 


where 37 ,..., x s are the zeros of the Laguerre polynomial L s (x) (c.f Formula 

( 6 . 11 )). 


Proof The polynomial det(7 — zA) is the denominator of the stability function 
(Formula (3.3)), so that by Theorem 3.10 

M ( * ) (0) + M (,- 1 ) (0)z + ... + M(0)z* =(1 — 7 z) s (8.30) 

with M{x) given by (3.25). Computing (0) from (8.30) we obtain 

7 ]> - '■) = = E 0) (-ri-4 = (-rI'JJ.t;) 

5 - i=l j=o 3 • 7 

which leads to (8.29). □ 


The stability function of the method of Lemma 8.1 has been studied in Sec¬ 
tions IV.4 (multiple real-pole approximations) and IV. 6 . We have further seen 
(Proposition 3.8) that R(oo) =0 when x 0 + h is a collocation point. This means 
that c q — 1 or 7 = l/x q for q G { 1 ,..., s} where 0 < x x < ... < x s are the zeros of 
L 3 (x). However, if we want A-stable methods, Theorem 4.25 restricts this point 
to be in the middle (more precisely: q = s/2 ors /2 + 1 for 5 even, q = (s-\- 1)/2 
for 5 odd). An apparently undesirable consequence of this is that many of the col¬ 
location points lie outside the integration interval (for example, for 5 = 5 and q — 3 
we have c 1 = 0.073, c 2 = 0.393, c 3 = 1, c 4 = 1.970, c 5 = 3.515). 

Since these methods with ^~\/x q are of order p — s only, it is easy to embed 
them into a method of higher order. Burrage (1978) added a further stage 

S+1 

9s+i = Vo + h L a s+i,jf( x o + c j h i 9j) 

7 = 1 

where c s+1 and a s+1 +1 are arbitrary and the other a s+i ■ are determined so 
that the (s + 1 )-stage method satisfies C(s) too. In order to avoid a new LU- 
decomposition we choose a s+1 S+1 = 7 . The coefficient c s+1 is fixed arbitrarily 
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as c s+1 = 0. We then find a unique method 

Vi =2/o+ 0 + Cjh, gj) 

3=1 

of order 6 + 1 by computing the coefficients of the interpolator quadrature rule. 
An explicit formula for the matrix T which transforms the Runge-Kutta matrix 
A to Jordan canonical form and A~ 1 to a very simple lower triangular matrix A 
is given in Exercise 1. It can be used for economically solving the linear system 
(8.13). 


Exercises 


1. (Butcher 1979). For the collocation method with c l5 ..., c s given by (8.29) 
prove that (e.g. for 6 = 4) 


T~ l AT — 7 


-1 1 


-1 1 


T - 1 A -1 T = 


111 


1 1 1 
1111 


where the transformation T satisfies 


\ J / 2,J = 1 

and Lj_ 1 (x) are the Laguerre polynomials. 

Hint. Use the identities 

L' n (x) = L' n _^x) - L n _, (:r), L n (x) = L n _ 1 (x) + X -L' n (x) 

and the Christoffel-Darboux formula 




■^n+1 (x)L n (y)~ (y) L n( X ) 


which, in the limit y —y x , becomes 


n 

Y^i L j{x)) 2 = (n + 1) (L n+1 (x)L' n (x) - L' n+1 (x)L n (xf). 



IV.9 Extrapolation Methods 


It seems that a suitable version of an IEM (implicit extrapolation 
method) which takes care of these difficulties may become a very 
strong competitor to any of the general discretization methods for 
stiff systems presently known. 

(the very last sentence of Stetter’s book, 1973) 


Extrapolation of explicit methods is an interesting approach to solving nonstiff dif¬ 
ferential equations (see Sect. II.9). Here we show to what extent the idea of extrap¬ 
olation can also be used for stiff problems. We shall use the results of Sect. II.8 
for the existence of asymptotic expansions and apply them to the study of those 
implicit and linearly implicit methods, which seem to be most suitable for the com¬ 
putation of stiff differential equations. Our theory here is restricted to classical 
h -» 0 order, the study of stability domains and A-stability. 

A big difficulty, however, is the fact that the coefficients and remainders of the 
asymptotic expansion can explode with increasing stiffness and the h -interval, for 
which the expansion is meaningful, may tend to zero. Bounds on the remainder 
which hold uniformly for a class of arbitrarily stiff problems, will be discussed 
later in Sect. VI.5. 


Extrapolation of Symmetric Methods 


It is most natural to look first for symmetric one-step methods as the basic integra¬ 
tion scheme. Promising candidates are the trapezoidal rule 

Vi+i = yi + 7^{f( x i,yi) + f( x i+i>yi+i)) (9-i) 

and the implicit mid-point rule 

Vi+i=Vi + h f( x i + ^>7i(yi+i +2/<))• (9-2) 

We take some step-number sequence n 1 < n 2 < n 3 < ..., set hj = H/n- and 
define 

Tji = Dhj ( x o + (9-3) 


the numerical solution obtained by performing n- steps with step size hj . As 
described in Sect. II.9 we extrapolate these values according to 


— T- k + 


T J,k 


j ' k+1 j ’ k 


(9.4) 
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Fig. 9.1. Stability domains for the extrapolated trapezoidal rule 


This provides an extrapolation tableau 


Tii 

t 2 . 



(9.5) 


all entries of which represent diagonally implicit Runge-Kutta methods (see Ex¬ 
ercise 1). Due to the symmetry of the basic schemes (9.1) and (9.2), T ik is a 
DIRK-method of order 2k . In order to study the stability properties of these meth¬ 
ods, we apply them to the test equation y f = \y. For both methods, (9.1) and (9.2), 
we obtain 

i+¥ 

Vi+1 = 733 Vi 

1 2 


so that the stability function Rj k (z) of the method T- k is given recursively by 

(z = HX) 

Rji(*)= (13711J ’ (9-&0 


R j,k+l( z ) 


R j,k( z ) + 


R j,ki 2 ) ~ R j -!,*(*) 

( n j/ n j-k ) 2 - 1 


(9.6b) 
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Already Dahlquist (1963) noticed that for n-^ = 1 and n 2 = 2 we have 

= l ( 4 (r^f) -(rrf))^^ 1 (9 - 7) 

an undesirable property when solving stiff problems. Stetter (1973) proposed tak¬ 
ing only even or only odd numbers in the step-number sequence {rij}. Then, 
all stability functions of the extrapolation tableau tend for 2 -* 00 to 1 or — 1, 
respectively. But even in this situation extrapolation immediately destroys the A- 
stability of the underlying scheme (Exercise 2). Fig. 9.1 shows the stability domains 
; \R kk (z)\ < 1} for the sequence {1,3,5,7,9,...}. 


Smoothing 


Some numerical examples reveal the power of the smoothing com¬ 
bined with extrapolation. (B. Lindberg 1971) 

Another possibility to overcome the difficulty encountered in (9.7) is smoothing 
(Lindberg 1971). The idea is to replace the definition (9.3) by Gragg’s smoothing 
step 


T J1 =S h .(x 0 + H), (9.8) 

S h ( x ) = ^(yft(z-fr) + 22/ft(z) + 2/ fe (z + fr)) • (9.9) 

With y h (x), S h (x) also possesses an asymptotic expansion in even powers of h. 
Therefore, extrapolation according to (9.4) is justified. For the stability function of 
T- x we now obtain 



which is an L -stable approximation to the exponential function. The stability func¬ 
tions Rjk(z) (obtainedfrom(9.6b)) all satisfy Rj k (z) = 0(z~ 2 ) for 2 -* 00 . For 
the step-number sequence 

{«,.} = { 1 , 2 , 3 , 4 , 5 , 6 , 7 ,...} 
the stability domains of R kk (z ) are plotted in Fig. 9.2. 


(9.11) 
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Fig. 9.2. Stability domains of Rkk( z ) 


The Linearly Implicit Mid-Point Rule 

Extrapolation codes based on fully implicit methods are difficult to implement effi¬ 
ciently. After extensive numerical computations, G. Bader and P. Deuflhard (1983) 
found that a linearly implicit (Rosenbrock-type) extension of the GBS method of 
Sect. II.9 gave promising results for stiff equations. This method is based on a two- 
step algorithm, since one-step Rosenbrock methods (7.4) cannot be symmetric for 
nonlinear differential equations. 

The motivation for the Bader & Deuflhard method is based on Lawson’s trans¬ 
formation (Lawson 1967) 

y(x) = e Jx -c(x), (9.12) 

where it is hoped that the matrix J & f'(y) will neutralize the stiffness. Differen¬ 
tiation gives 

c ' = e~ Jx ■ g(x, e Jx c) with g{x,y) = f{x,y) - Jy. (9.13) 

We now solve (9.13) by the Gragg algorithm (II.9.13b) 

c i+1 =c i _ 1 +2he~ Jxi ■g(x i ,e Jxi c i ) 

and obtain by back-substitution of (9.12) 

e ~ hJ Ui +1 = + 2hg(x i , y { ). (9.14) 

For evident reasons of computational ease we now replace e ±hJ by the approx¬ 
imations I±hJ and obtain, adding an appropriate starting and final smoothing 
step, 

{I-hJ)y 1 =y 0 + hg(x Q ,y 0 ) (9.15a) 

(J - hJ)y i+1 = (/+ hJ)y i _ 1 + 2hg(x t , yj (9.15b) 

S h( x ) = \(V 2 m-i+y 2 m+i)- where x = x 0 + 2mh. (9.15c) 
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Substituting finally g from (9.13), we arrive at (with x = x Q + 2 mh , x i =x 0 +ih) 

(I-hJ)( yi -y 0 ) = hf(x Q ,y Q ) (9.16a) 

(I - hj)(y i+1 - Vi ) = -(/ + hJ){ yi - y i _ 1 ) + 2 hf(x { , y { ) (9.16b) 

S h ( X ) = \iV 2 m-l + 2 / 2 m+l) (9.16c) 

where J stands for some approximation to the Jacobian (t 0 , y 0 ). Putting J = 0, 
Formulas (9.16a) and (9.16b) become equivalent to those of the GBS method. The 
scheme (9.16b) is the linearly implicit (or semi-implicit) mid-point rule, Formula 
(9.16a) the linearly implicit Euler method. 

Theorem 9.1 (Bader & Deuflhard 1983). Let f(x,y) be sufficiently often differ¬ 
entiable and let J be an arbitrary matrix; then the numerical solution defined by 
(9.16a,b,c) possesses an asymptotic expansion of the form 

i 

y( x ) - s h( x ) = X e j( x ) h2} + h 2l+2 C{x, h) (9.17) 

3 = 1 

where C(x,h) is bounded for x 0 < x <x and 0 < h < h 0 . For J ^ 0 we have in 
general ej(x Q )^0. 

Proof As in Stetter’s proof for the GBS algorithm we introduce the variables 

h*=2h, x* k =x 0 +kh*, u 0 = v 0 =y 0 , u k =y 2k , 

v k = i 1 ~ hJ )V2k+l+ hJ V2k ~ h f( x 2k^2k) ( 9 - 18 ) 

= (I + hJ)y 2k _ 1 -hJy 2k + hf(x 2k ,y 2k ). 

Method (9.16a,b) can then be rewritten as 


HI - U k 


( f{ x l + \ >2/2fc + l) ^2fc+l+^( fc + 2 + k ) 

+ V1 {f( x i + h *’ u k+i) + f( x h u ki) +Jy2k+i-J(~ 


Uk + l+Uk ' 
2 


where, from (9.18), we obtain the symmetric representation 


= v k+i+ v k +htJ ^k±i_ 


'h+n a k+1) 


The symmetry of (9.19) is illustrated in Fig. 9.3 and can be checked analytically 
by exchanging <-»• u k , v k+1 f* v k , h* f* — h *, and x* k f* x* k + h *. Method 
(9.19) is consistent with the differential equation 

u' = f(x,v)~ J(v-u), u(x 0 ) = y 0 

v' = f{x,u) + J{v-u), v(x 0 )=:y 0 
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whose exact solution is u(x) — v(x) = y(x), where y(x) is the solution of the 
original equation y f = f(x, y). Applying Theorem II.8.10 we obtain 


y(x)-u h , (z) = (. x)h + h 2l + 2 A(x, h) 

3 = 1 
/ 

y( x ) - v h * (*) = ^2 b j( x ) h23 + h 2l+2 B(x, h ) 

3 = 1 


(9.20) 


with a-(x 0 ) = bj(x 0 ) = 0. With the help of Formulas (9.18) we can express the 
numerical solution (9.16c) in terms of u m and v m as follows: 

^(y 2 m+l + y 2 m- 1 ) = i 1 - h2 J 2 ) _1 ( v m + ^ J {f( x 2 m> U m) ~ J U m)) > 

and we obtain for x = x 0 + 2m h, 


y( x ) - S h (x) =(I- h 2 J 2 ) 1 (y(z) - v h . (z) 

- h 2 j(f(x, u h . (z)) + J(y(x ) - u h . (z))^. 
Inserting the expansions (9.20) we find (9.17). 


□ 


As an application of this theorem we obtain an interesting theoretical result on 
the existence of W -methods (7.4) (with inexact Jacobian). We saw in Volume I 
(Exercise 1 of Sect. II.9 and Theorem II.9.4) that the T j k of the extrapolated GBS 
method represent explicit Runge-Kutta methods. By analogy, it is not difficult 
to guess that the T- k for the above linearly implicit midpoint rule represent W- 
methods (more details in Exercise 3) and we have the following existence result for 
such methods. 

Theorem 9.2. For p even , there exists a W -method (7.4) of order p with s — 
p(p + 2)/4 stages. 
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Proof. It follows from (9.20) that for x = x 0 + 2 mh the numerical solution y h (x) = 
y 2m possesses an h 2 -expansion of the form (9.17) with e-{xf) = 0. Therefore, 
extrapolation yields IF-methods of order 2k (in the fc-th column). The result 
follows by taking {rij} = {2,4,6,8,10,12,...} and counting the number of nec¬ 
essary function evaluations. □ 


Table 9.1. A(a) -stability of extrapolated 

linearly implicit mid-point rule 


0 

0 

0 






0 

0 

0 

90° 





90° 

0 

0 

0 

90° 




0 

0 

0 

89.34° 

87.55° 

87.34° 



0 

0 

0 

88.80° 

86.87° 

86.10° 

86.02° 


0 

0 

0 

88.49° 

87.30° 

86.61° 

86.36° 

86 . 33 ° 

CO 

0 

0 

88.43° 

87.42° 

87.00° 

86.78° 

86.70° 86.69° 


For a stability analysis we apply the method (9.16) with J — A to the test 
equation y' = Xy . In this case Formula (9.16b) reduces to 

1 -f- h\ 

Vi+1 = l-h\ Vi ~ 1 


and the numerical result is given by 

i /1 4- h\\ m— i 

S k ( Io + 2mfe)=i T -^( I -^) y„, (Ml) 

exactly the same as that obtained from the trapezoidal rule with smoothing (see 
Formula (9.10)). We next have to choose a step-number sequence {n •}. Clearly, 
n- = 2mj must be even. Bader & Deuflhard (1983) proposed taking only odd 
numbers m- , since then S h (x 0 + 2 m-h) in (9.21) has the same sign as the exact 
solution e X2m j h y 0 for all real hX < 0. Consequently they were led to 


= {2,6,10,14,22,34,50,...}. (9.22) 


Putting Tjj = S h . (z 0 + H) with h • — H/nj and defining T- k by (9.4) we obtain a 
tableau of W -methods (7.4) (Exercise 3). By Theorem 9.1 the &-th column of this 
tableau represents methods of order 2A: — 1 independent of the choice of J (the 
methods are not of order 2k , since e t (x 0 ) 7 ^ 0 in (9.17)). The stability function of 
is given by 


Rj iW 




1+^7 


1 - 

j 


nj/2-1 


(9.23) 


and those of T jk can be computed with the recursion (9.6b). An investigation of 
the E -polynomial (3.8) for these rational functions shows that not only T-j, but 
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Fig. 9.4. Stability domains of extrapolated linearly implicit mid-point rule 

also T 22 , T 32 and T 33 are A-stable (Hairer, Bader & Lubich 1982). The angle of 
A(a) -stability for some further elements in the extrapolation tableau are listed in 
Table 9.1. Stability domains of T kk for k = 2,3,4,5,6 are plotted in Fig. 9.4. 

Implicit and Linearly Implicit Euler Method 


Why not consider also non-symmetric methods as basic integration schemes? Deu- 
flhard (1985) reports on experiments with extrapolation of the implicit Euler method 


y i+1 = V i + hf{x H . 1 ,y i+1 ) 

(9.24) 

and of the linearly implicit Euler method 


{I - hJ)(y i+1 - = hfix^y;), 

(9.25) 

where, again, J is an approximation to | y(x 0 ,y 0 )- These methods are not sym¬ 
metric and have only a h -expansion of their global error. We therefore have to 
extrapolate the numerical solutions at x 0 + H according to 

Tj k — T-_ 2 k 

rp _ rn , J,K J 

jM1 hk K/n^)- 1 ’ 

(9.26) 

so that T- k represents a method of order k . 

For both basic methods, (9.24) and (9.25), the stability function of T- k is the 
same and defined recursively by 


(9.27a) 


(9.27b) 

Taking the step-number sequence 


{«,-} = {1,2,3,4,5,6 ,7,...} 

(9.28) 
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we have plotted in Fig.9.5 the stability domains of R kk {z) (left picture) and 
R k fc _ 1 (z) (right picture). All these methods are seen to be A{a) -stable with a 
close to 90°. The values of a (computed numerically) for Rj k (z) with j < 8 are 
given in Table 9.2. 

We shall see in the chapter on differential algebraic systems that it is preferable 
to use the first subdiagonal of the extrapolation tableau resulting from (9.28). This 
is equivalent to the use of the step number sequence {ro •} = {2,3,4,5,... Also 
an effective construction of a dense output can best be motivated in the setting of 
differential-algebraic equations (Sect. VI.5). 

Table 9.2. A(a)-stabiliy of extrapolated Euler 
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Fig. 9.5. Stability domains of extrapolated Euler 


Implementation 


Extrapolation methods based on implicit discretizations are in general less efficient 
than those based on linearly implicit discretizations. The reason is that the arising 
nonlinear systems have to be solved very accurately, so that the asymptotic expan- 
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sion of the error is not destroyed. The first successful extrapolation code for stiff 
differential equations is METAN1 of Bader & Deuflhard (1983), which implements 
the linearly implicit mid-point rule (9.16). In fact, Formula (9.16b) is replaced by 
the equivalent formulation 

&yi = &yi-i+ 2 {I-hJ)- 1 (hf(x i ,y i )-Ay i _ i y A yi = y i+1 - yi (9.29) 

which avoids a matrix-vector multiplication. The step size and order selection of 
this code is described in Deuflhard (1983). Modifications in the control of step size 
and order are proposed by Shampine (1987). We have implemented the following 
two extrapolation codes (see Appendix): 

SODEX is based on the linearly implicit mid-point rule (9.16), uses the step- 
number sequence (9.22) and is mathematically equivalent to METAN1. The step 
size and order selection in SODEX is with some minor changes that of the non-stiff 
code ODEX of Sect. II.9. We just mention that in the formula for the work per unit 
step (II.9.26) the number A k is augmented by the dimension of the differential 
equation in order to take into account the Jacobian evaluation. 

SEULEX is an implementation of the linearly implicit Euler method (9.25) us¬ 
ing the step-number sequence {2,3,4,5,6,7,...} (other sequences can be chosen 
as internal options). The step size and order selection is that of SODEX. The orig¬ 
inal code (EULSIM, first discussed by Deuflhard 1985) uses the same numerical 
method, but a different implementation. 

Neither code can solve the van der Pol equation problem in a 
straightforward way because of overflow ... 

(L.F. Shampine 1987) 

A big difficulty in the implementation of extrapolation methods is the use of 
“large” step sizes. During the computation of T n one may easily get into trouble 
with exponential overflow when evaluating the right-hand side of the differential 
equation. As a remedy we propose the following strategies: 

a) In establishing the extrapolation tableau we compare the estimated error err- — 
\\Tj j_! —Tjj\\ with the preceding one. Whenever err - > err ; _ 1 for some 
j > 3 we restart the computation of the step with a smaller H , say, H = 0.5 • H . 

b) In order to be able to interrupt the computations already after the first /- 
evaluations, we require that the step sizes h — H/n i (for i = 1 and i — 2) be 
small enough so that a simplified Newton iteration applied to the implicit Euler 
method y = y 0 + hf(x , y), x = x 0 + h would converge (“stability check”, an 
idea of Deuflhard). The first two iterations read 

(I-hJ)A 0 = hf(x 0 ,y 0 ), y {1) =y 0 + A 0 

(I — hJ)A 1 = hf(x Q + h, j/ (1) ) — A 0 . 

The computations for the step are restarted with a smaller H, if || Aj || > l|A 0 || 
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(divergence of the iteration). Observe that for both methods, (9.16) and (9.25), 
no additional function evaluations are necessary. For the linearly implicit mid¬ 
point rule we have the simple relations A 0 = A y Q , ~ |(A?/ 1 — Ay 0 ) (see 

(9.29)). 

Non-Autonomous Differential Equations. Given a non-autonomous differential 
equation y r = /(x, y), one has several possibilities to apply the above extrapolation 
algorithms: 

i) apply the Formula (9.16) or (9.25) directly (this is justified, since all asymptotic 
expansions hold for general non-autonomous problems); 

ii) transform the differential equation into an autonomous system by adding x' — 1 
and then apply the algorithm. This yields 

df 

(I-hJ){y t+ 1 -y i ) = hf(x l ,y l ) + h 2 -~-(x 0 ,y 0 ) (9.31) 

for the linearly implicit Euler method (the derivative | {:(x 0 ,y 0 ) can also be 
replaced by some approximation). For the linearly implicit mid-point rule, 
(9.16a) has to be replaced by (9.31) with i = 0, the remaining two formulas 
(9.16b) and (9.16c) are not changed. 

iii) apply one simplified Newton iteration to the implicit Euler discretization (9.24). 
This gives 

(I -hJ)(y i+1 -y i ) = hf(x i+1 ,y i ). (9.32) 

The use of this formula avoids the computation of the derivative df/dx, but 
requires one additional function evaluation for each T- x . In the case of the 
linearly implicit mid-point rule the replacement of (9.16a) by (9.32) would 
destroy symmetry and the expansions in h 2 . 

A theoretical study of the three different approaches for the linearly implicit Euler 
method applied to the Prothero-Robinson equation (see Exercise 4 below) indicates 
that the third approach is preferable. More theoretical insight into this question will 
be obtained from the study of singular perturbation problems (Chapter VI). 

Implicit Differential Equations. Our codes in the appendix are written for prob¬ 
lems of the form 

My’= f{x,y) (9.33) 

where M is a constant square matrix. The necessary modifications in the basic for¬ 
mulas are obtained, as usual, by replacing all f's and J's by M~ l f and M~ l J, 
and premultiplying by M . The linearly implicit Euler method then reads 

(M-hJ)(y i+1 -y i ) = hf(x i ,y i ) (9.34) 

and the linearly implicit mid-point rule becomes, with Ay - = y i+l — y i , 

+2{M -hJ)- 1 (hf(x tl y l )-MAy l _ i y 


(9.35) 
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Exercises 

1. Consider the implicit mid-point rule (9.2) as basic integration scheme and de¬ 
fine T jk by (9.3) and (9.4). 

a) Prove that T .;■* represents a DIRK-method of order p = 2k with 5 = n 1 + 
7^2 ~b. •. Tij stages. 

b ) ^> defined by (9.8) and (9.4), is equivalent to a DIRK-method of order 
p = 2k — 1 only. 

2. Let Rjk(z) be given by (9.6) and assume that the step-number sequence con¬ 
sists of even numbers only. Prove that Rj 2 (z) cannot be A-stable. More 
precisely, show that at most a finite number of points of the imaginary axis can 



Fig. 9.6. How extrapolation destroys A -stability 


3. Prove that S h (x) , defined by (9.16), is the numerical result of the (2 n + 1)- 
stage W -method (7.4) with the following coefficients (n = 2m) : 

{ 1/n if j = 1 and i even, 

2 /n if 1 < j <i and i — j odd, 

0 else. 

_ f (—1 y-i/n ifj = lovj=i, 

| 2(—1 y-i/n if 1 <j<i. 

b i = a n+i,* + 7n+i,i for all i. 

4. Apply the three different versions of the linearly implicit Euler method (9.25), 
(9.31) and (9.32) to the problem y f — A (y — (f(x)) + <p'(x) . Prove that the 
errors e • = y { — <p(x •) satisfy e- +1 = (1 — hX) -1 e • + S h (x { ) , where for h -» 0 
and h\ -> 00, 

S h {x) - - h<p'{x) + 0{h 2 ) + 0{ A” 1 ), 
h 2 

h( x ) = -~2V"{ X ) + (1 - hXy 1 h 2 \(<p'(x) - ip'(x 0 )) + O(h 3 4 ) + O(h\~ 1 ), 

8 h (x) = (l-h\y 1 (^v”(x) + 0(h 3 j), 
respectively. 
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Theory without practice cannot survive and dies as quickly as it 
lives. (Leonardo da Vinci 

1452-1519, cited from M. Kline, Math. Thought 1972, p. 224) 

Sine experientia nihil sufficienter scrire potest (Without experi¬ 
ence it is not possible to know anything adequately). 

(Inscription overlooking Botanic Gar¬ 
den, Oxford; found in The Latin Citation Calendar , Oxford 1996) 

After having seen so many different methods and ideas in the foregoing sections, 
it is legitimate to study how all these theoretical properties pay off in numerical 
efficiency. 


The Codes Used 

We compared the following codes, some of which are described in the Appendix: 

RADAU5 and SDIRK4 are implicit Runge-Kutta codes. The first one is based 
on the Radau IIA method with 5 = 3 of order 5 (Table 5.6), whereas the second 
one is based on the SDIRK method (6.16) of order 4. Both methods are Te¬ 
stable. Details of their implementation are given in Sect. IV.8. 

RODAS and ROS4 are Rosenbrock codes of order 4 with an embedded 3rd 
order error estimator. ROS4 implements the methods of Table 7.2. A switch 
allows one to choose between the different coefficient sets. The underlying 
method of RODAS satisfies additional order conditions for differential-alge¬ 
braic equations (see Sect. VI.4 below), but requires a little more work per step. 
RODAS5 is an extension of RODAS to order 5. Its coefficients are constructed 
by Di Marzo (1992). 

SEULEX and SODEX are extrapolation codes. They implement the (Stiff) lin¬ 
early implicit EULer Extrapolation method (9.32) and the extrapolation algo¬ 
rithm based on the linearly implicit mid-point rule (method (9.16) of Bader & 
Deuflhard 1983), respectively. Both methods are discussed in Sect. IV.9. 

In the numerical experiments of this section we have also included the results of 
LSODE (a BDF code of Hindmarsh 1980). It is a representative of the class of 
multistep methods to be described in Chapter V. 

Many of the treated examples are very stiff and explicit methods would require 
hours to compute the solution. On some examples, however, it was also interesting 
to see their performance, especially for the methods with extended region of stabil¬ 
ity (e.g., the Runge-Kutta-Chebyshev code RKC of Sommeijer (1991), explained in 
Sect. IV.2), as well as for a standard explicit Runge-Kutta code, such as DOPRI5 of 
Volume I. 
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Twelve Test Problems 

Man hiite sich, auf Grund einzelner Beispiele allgemeine Schliis- 
se fiber den Wert oder Unwert einer Methode zu ziehen. Dazu 
gehort sehr viel Erfahrung. (L. Collatz 1950) 

The first extensive numerical comparisons for stiff equations were made by En¬ 
right, Hull & Lindberg (1975). Their STIFF-DETEST set of problems has become 
a veritable “must” for generations of software writers (see also the critical remarks 
of Shampine 1981). Several additional test problems, usually from chemical kinet¬ 
ics, have been proposed by Enright & Hull (1976). An interesting review article 
containing also problems of large dimension is due to Byrne & Hindmarsh (1987). 
The problems chosen here for our tests are the following: 

VDPOL — the van der Pol oscillator (see (1.5’) and Fig. 8.1) 

y'i =V 2 

y '2 = {( l -y\)y 2 -y\)l e ^ e = io -6 (io.i) 

t/i(0) = 2, 2/ 2 (0)=0; * 0 u t = 1) 2,3,4,, 11. 

ROBER — the reaction of Robertson (1966) (see (1.3) and (1.4)) 

y[ = - 0-04 2 /j + 10 4 y 2 y 3 y 1 ( 0 ) = 1 

y' 2 - 0.042/j - 10 4 y 2 y 3 - 3 • Wy\ y 2 ( 0) = 0 (10.2) 

y'i= 3-10 7 yl 2/ 3 (0) = 0, 

one of the most prominent examples of the “stiff” literature. It was usually treated 
on the interval 0 < x < 40, until Hindmarsh discovered that many codes fail if x 
becomes very large (10 11 say). The reason is that whenever the numerical solution 
of y 2 accidentally becomes negative, it then tends to — oo and the run ends by 
overflow. We have therefore chosen x oui = l,10,10 2 ,10 3 ,...,10 n . 

OREGO — the Oregonator, the famous model with a periodic solution describing 
the Belusov-Zhabotinskii reaction (Field & Noyes 1974, see also Enright & Hull 
1976) 

v'i = 77.27(y 2 + Vl (l- 8.375 • KT 6 ^ - y 2 )) 

2/2 = ^27(2/3 -(H-2/i)2/ 2 ) (10-3) 

^3 = 0.161(2/! - 2 / 3 ) 

2/i(0) = l, y 2 {°) = 2, y 3 (0) = 3, x out = 30,60,90,...,360. 

For pictures see Volume I, p. 119. 

HIRES — this chemical reaction involving eight reactants was proposed by Schafer 
(1975) to explain “the growth and differentiation of plant tissue independent of 
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photosynthesis at high levels of irradiance by light”. It has been promoted as a test 
example by Gottwald (1977). The corresponding equations are 

y\ = -1.71 • y 1 + 0.43 • y 2 + 8.32 • y 3 + 0.0007 

y' 2 = 1.71-yj — 8.75 • y 2 

y'z = -10.03 • y 3 + 0.43 • y 4 + 0.035 • y 5 

Vi = 8-32 • y 2 + 1.71 • y 3 - 1.12 • y 4 (10.4) 

2/5 = -1-745 • y 5 + 0.43 • y 6 + 0.43 • y 7 

y' 6 = -280 • y 6 y 8 + 0.69 • y 4 + 1.71 • y 5 - 0.43 - y 6 + 0.69 • y 7 

y 7 = 280 • y 6 y s — 1.81 • y 7 

y& = — 2/7 

y 1 (0) = l, 2 / 2 ( 0 ) = 2 / 3 ( 0 ) = • • • = 2 / 7 ( 0 ) = 0, 2 / 8 ( 0 ) = 0.0057 

and chosen output values are x out = 321.8122 and 421.8122. 

E5 — is another chemical recation problem, called “E5” in the collection by En¬ 
right, Hull & Lindberg (1975). It is given by 

y' 4 = -Ay 1 -By 1 y 3 
y' 2 = Ay 4 — MCy 2 y 3 

2/3 = A V\ ~ B V\Vz - MCy-iVz +Cy 4 

24= -%i 2/ 3 ~Cy 4 

where A — 7.89 • 10 -10 , J3 = 1.1 • 10 7 , C = 1.13 • 10 3 , and M = 10 6 . As we can 
see from Fig. 10.1 the variables are badly scaled (y 1 « 10 -3 at the beginning, all 
other components do not exceed the value 1.46 • 10 _1 °), and “... a scalar absolute 
error tolerance is quite unsuitable” (Shampine 1981). The differential equation 
possesses the invariant y 2 — y 3 — y 4 = 0 , and it is recommended to use the relation 
y f 3 =y f 2 — y' in the function subroutine (because of eventual cancellation of digits). 

Originally the problem was posed on the interval 0 < x < 1000, but Alexan¬ 
der (1997) discovered that the solutions possess interesting properties on a much 
longer interval. We follow this suggestion and consider output values at x out = 
10 , 10 3 , 10 5 , 10 7 ,..., 10 13 . 


2^(0) = 1.76-10- 3 
2 / 2 ( 0 ) = 0 
2/3(0) = 0 

2/4(0) = 0, 


(10.5) 
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PLATE — this is a linear and non-autonomous example of medium stiffness and 
medium size. It describes the movement of a rectangular plate under the load of a 
car passing across it: 

d 2 u du 

-^+ w ^+<7AAu = /(z,i / ,i). (10.6) 

The plate O = {(x, y) ; 0 < x < 2, 0 <y < 4/3} is discretized on a grid of 8 x 5 
interior points x i — ih , y ■ = jh, h = 2/9 with initial and boundary conditions 

du 

u ldfi=0, Aw| an = 0, u(x,y, 0) = 0, -j^{x,y,0) = 0. (10.7) 

The integration interval is 0 < t < 7. The load /(x, y, t ) is idealized by the sum 
of two Gaussian curves which move in the x -direction and which reside on “four 
wheels” 

, . _ f 200(e ~ 5 ( t ~ x ~ 2 ) 2 -f e ~ 5 ( i - x ~ 5 ) 2 ) if y = y 2 or y 4 

l 0 for all other y. 

The plate operator A A is discretized via the standard “computational molecule” 

1 

2-8 2 

1 -8 20 -8 1 
2-8 2 

1 

and the friction and stiffness parameters are chosen as uj — 1000 and a = 100. 
The resulting system is then of dimension 80 with negative real as well as complex 
eigenvalues ranging between —500 < Re A < 0 with maximal angle a « 71° (see 
Definition 3.9). 

BEAM — the elastic beam (1.10) of Sect. IV. 1. We choose n — 40 in (1.10’) so 
that the differential system is of dimension 80, and 0 < t < 5 as integration interval. 
The eigenvalues of the Jacobian are purely imaginary and vary between — 6400i 
and +6400i (see Eq. (2.23)). The initial conditions (1.18) and (1.19) are chosen 
such that the solution nevertheless appears to be smooth. However, a detailed nu¬ 
merical study shows that the exact solution possesses high oscillations with period 
« 27r/6400 and amplitude « 10 -6 (see Fig. 10.2). 



Fig. 10.2. Third finite differences A 3 yso/ Ax 3 of solutions 
of the beam equation (1.10’) with n = 40 for 0 < x < 0.07 
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Fig. 10.3. The cusp catastrophe with N = 32. 


CUSP — this is a combination of Zeeman’s “cusp catastrophe” model (—ey — 
y 3 + ay + b) for the nerve impulse mechanism (Zeeman 1972) combined with the 
van der Pol oscillator (see Fig. 10.3) 


dy_ 

dt 



(: y 3 + ay + 6) + a 


d 2 y 
dx 2 


da 

dt 

db 

dt 


6 + 0.07?; + cr 


d 2 a 
dx 2 


(1 — a 2 )b — a — OAy + 0.035?; + cr 


d 2 b 

d^ 2 


( 10 . 8 ) 


where 


v = 


u 

u + 0.1’ 


u 


{y — 0.7)(y — 1.3). 


We put (7 = 1/ 144 and make the problem stiff by choosing e = 10~ 4 . We discretize 
the diffusion terms by the method of lines 

Vi = -!0 4 (2/* 3 + a iVi + h i) J rD{y i _ 1 -2 yi +y i+1 ) 

a i = b i + 0.07v i + D(a i _ 1 —2a i +a i+1 ) i = l,...,N (10.8’) 

K = (1 - - 0.42/, + 0.035uj + D(6 t _ x - 26, + 6, +1 ) 

where 

u- n N 2 

at = 32, ^ x . u i = (j/i-°- 7 )(j/i-l-3), D = aN = —, 

with periodic boundary conditions 

2/o : = 2 /at, «o :=a Ni 

Vn+i : ~y u a ;v+i :=<2 i> ^iv+i 
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and obtain a system of dimension 3 • N = 96. We take the initial values 
J/«(0) = 0) <^(0) =-2cos(^), 6 i (0) = 2sin(^) i = l,...,N. 

and £ out = 1.1. 

BRUSS — this is the equation (1.6’) with a = 1/50, the same initial conditions as 
in Sect. IV. 1, and integration interval 0 < t < 10. But we now let N = 500 so that 
( 1 . 6 ’) becomes a system of 1000 differential equations with largest eigenvalue close 
to —20000. The equations therefore become considerably stiff. The Jacobian of 
this system is banded with upper and lower bandwidth 2 (if the solution components 
are ordered as u 1? u 1? u 2 , u 2 , u 3 , v 3 , etc.). 


KS — is the one-dimensional Kuramoto-Sivashinsky equation 
dU d 2 U d±U 1 dU 2 

dt ~ dx 2 ” < 9 x 4 2 dx (10 ' 9) 

with periodic boundary conditions u(x + L,t) =u(x,t), taken from Collet, Eck- 
mann, Epstein & Stubbe (1993). We choose L = r/q, q = 0.025, and take as 
initial condition 


U{x, 0) = 16 -max( 0 , 77 ^ 773 , 773 , 774 ), 


r] l = min (x/L, 0.1 — x/L\ 

7/2 - 20 (x/L - 0.2)(0.3 -x/L), 
? 7 3 = min (x/L — 0.6,0.7 — x/L), 
? 7 4 = min(x/L — 0.9,1 — x/L), 


The inverse heat equation term —d 2 U/dx 2 creates instability, which is stabilized 
for the higher oscillations by the beam equation term —d A U/dx A . The nonlinear 
transport term dU 2 /dx couples the modes and ensures that the solution remains 
bounded. All this creates wonderful chaos (see Fig. 10.4). 

We solve Eq. (10.9) using the pseudo-spectral method, i.e., we consider the 
Fourier coefficients 

U\(i)=l/ U(x,t)e~ t<ljx dx, U{x,t)=^U j {t)e iqjx , (10.10) 

j€ Z 


so that (10.9) takes the form of an infinite dimensional ordinary differential equa¬ 
tion 




We truncate this system as follows: for a fixed N , say N = 1024, we consider the 
N -periodic sequence u(t) = {uj(t)} solving the ordinary differential equation 


u' = ( d 2 -d 4 )w- ( 10 . 11 ) 

where d denotes the N -periodic sequence given by d 3 = qj for |j| < N/2 and 
d N / 2 = 0, and the product of sequences in (10.11) is componentwise. The discrete 
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Fig. 10.4. Solution of Kuramoto-Sivashinsky equation 


Fourier transform F N can be computed by FFT. From the fact that U(x,t) is real 
it follows that the sequence u is hermitian, i.e., u_■ = u-. Hence, the routine 
REALFT from Press, Flannery, Teukolsky & Vetterling (1986,1989), Chapter 12, is 
best suited for computing the right-hand side of (10.11). Since d 0 = d N / 2 = 0, the 
components u 0 (t) and u N / 2 (t) are constant and need not be integrated. We thus 
are concerned with an ordinary differential equation of real dimension N — 2 = 
1022. As initial values we take the discrete Fourier transform of {U(jL/N, 0)} 
with the (AT/2)th component put to zero. In our tests we solve the differential 
equation (10.11) on the interval 0 < t < 100 (see Fig. 10.4). 

It can be seen from Fig. 10.5 that the Fourier modes tend to zero for j -> oo, 
behave chaotically, and, by computing their mean values over a long period, that 
the modes for qj « y/2/2 are dominant. 


o 
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Fig. 10.5. Fourier modes for Kuramoto-Sivashinsky equation 


BECKDO — the Becker-Doring model describes the dynamics of a system with a 
large number of identical particles which can coagulate to form clusters. We let y k 
denote the expected number of k -particle clusters per unit volume. Assuming that 
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clusters can gain or loose only single particles, we are led to the system 


N-l 

y[ — — X/ y' N ~ ^N-i 

k=l 


Vk — Jk-i~ Jki A; — 2,3,..., TV — 1, 


( 10 . 12 ) 


where J k =y 1 y k - b k+1 y k+1 and b k+1 = exp(fc 2 / 3 - (fc - l) 2 / 3 ). For a detailed 
description of this system we refer to the article by Carr, Duncan & Walshaw 
(1995). This equation is especially interesting because of its metastability (ex¬ 
tremely slow variations in the solution over very long time intervals; see Fig. 10.6). 


10 15 



As initial condition we take 

2/i(0) = £, 24(0) = 0 for fc = 2,..., AT (10.13) 

(no clusters at the beginning). It can be seen by differentiation that the density 
(total number of particles per unit volume) 

N 

Y, k y k i=e) (10.14) 

fc=l 

is an invariant of the system (10.12). Most numerical schemes (in particular Runge- 
Kutta methods and multistep methods) preserve automatically such linear invari¬ 
ants in the absence of round-off errors. Whenever the relation (10.14) is not satis¬ 
factorily preserved, there is the possibility to re-establish it during the computations 
by projections (see “differential equations with invariants”, Sect. VII.2). This pre¬ 
cautionary measure was not used in the subsequent numerical tests. 

In order to be able to observe the metastable states of the system, the dimen¬ 
sion N has to be sufficiently large. Following the experiments of Carr, Duncan & 
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Walshaw (1995) we take N = 5000 and g = 7.5, and consider the solution on the 
interval 0 < t < 10 15 . We compare the errors at x out = 1,10,10 2 ,10 3 ,..., 10 15 . 

The Jacobian of this system is tri-diagonal with an additional non-zero first row 
and a non-zero first column. A Gershgorin test reveals that its eigenvalues can not 
go, except for the initial phase, beyond —10. Stiffness, in this example, is there¬ 
fore not created by large eigenvalues of J, but by the extremely long integration 
interval. 



Fig. 10.7. Solution of Brusselator in 2 dimensions 


BRUSS-2D — the two-dimensional Brusselator reaction-diffusion problem of Sect. II. 10 






152 IV. Stiff Problems — One-Step Methods 


in its discretized form (11.10.14), but this time we make the problem stiff by in¬ 
creasing the coefficient a (which was 0.002) to a = 0.1 and by increasing the 
number of grid points to N = 128. This gives an ordinary differential equation of 
dimension 2 N 2 = 32768. The initial conditions, chosen here as 

u(x,y,0) =22-y(l —j/) 3/2 , v(x,y,0) = 27 • x(l - x) 3/2 , (10.16) 

are quickly wiped out by the strong diffusion (see Fig. 10.7 for t = 1), we therefore 
suppose that the inhomogeneity f(x,y,t) defined by 

f(r „ n _ f 5 if (x - 0.3) 2 +(y- 0.6) 2 < 0.1 2 and t > 1.1 

nx,y,t )-{ 0 else 

models an extra addition of substance u in a small disc. In order to be able to solve 
the linear algebra comfortably by a double FFT routine we replace the Neumann 
conditions of Sect. II. 10 by periodic boundary conditions 

u(x + l,y,t) =u(x,y,t), u(x,y + l,t) = u(x,y,t). 

As output points we choose x out = 1.5 and 11.5. 


Results and Discussion 

For each of these examples we have computed very carefully the exact solution at 
the specified output points. Then, the above codes have been applied with many 
different tolerances 


Tol = l(T 2-m/4 , m = 0,1,2,..., 32. 

More precisely, we set the relative error tolerance to be Rtol = Tol and the abso¬ 
lute error tolerance Atol = 10 -6 • Tol for the problems OREGO and ROBER, Atol = 
10 -4 • Tol for HIRES, Atol = 10 ~ 3 • Tol for PLATE and BECKDO, Atol= 1.7 • 10 “ 24 
for E5, and Atol = Tol for all other problems. Several codes returned numerical re¬ 
sults which were considerably less precise than the required precision, while other 
methods turned out to be more reliable. As a reasonable measure of efficiency we 
have therefore chosen to compare 

- the actual error (a norm taken over all components and all output points) 

- the computing time (of a SUN Sparc 20 Workstation) in seconds. 

The obtained data are then displayed as a polygonal line in a “precision-work dia¬ 
gram” in double logarithmic scales. The integer-exponent tolerances 10 -2 , 10 ~ 3 
, 10 -4 ,... are displayed as enlarged symbols. The symbol for Tol = 10 ~ 5 is spe¬ 
cially distinguished by its gray colour. The more this line is to the right, the higher 
was the obtained precision; the higher this line is to the top, the slower was the code. 
The “slope” of the curve expresses the (effective) order of the formula: lower order 
methods are steeper than higher order methods. The results of the above codes on 
the 12 test examples are displayed in Figs. 10.8 and 10.9. 
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VDPOL, ROBER, OREGO — are very stiff problems of small dimension. We see 
from Fig. 10.8 that the Rosenbrock code RODAS is best for low tolerances (10“ 2 
to 10“ 5 ), whereas the extrapolation code SEULEX is superiour for stringent toler¬ 
ances. Due to the cheapness of the function evaluations the multistep code LSODE 
requires in general slightly more computing time than the one-step codes. We also 
remark that for a given tolerance (the position of the gray symbol for Tol = 10~ 5 ) 
the code RADAU5 gives the precisest result, followed by RODAS, SEULEX, and 
LSODE. 

HIRES — this problem is less stiff and can also be solved by explicit methods. 
The computing times for the explicit code DOPRI5 are initially perfectly horizon¬ 
tal. This is, of course, no surprise, because the step size is restricted by stability. 
The (explicit, but stabilized) Runge-Kutta-Chebyshev code RKC shows a consider¬ 
able improvement over DOPRI5 for low tolerances. The stiff codes are still more 
efficient. 


E5 — is a stiff and badly scaled problem, which is integrated over a very long time. 
Codes cannot work correctly, if the absolute tolerance Atol is too large. The codes 
RODAS (for low tolerances) and RADAU5 (for Tol < 10 -4 ) give the best results. 
LSODE works safely only for Tol < 10 -5 , whereas SEULEX has problems with 
round-off errors at high precision. 


PLATE and BEAM — are both problems of the type y" = f(x, y, y f ), implemented 
as the first order system y' = v , v' = /(#, y, v ). For stiff codes the linear systems 
to be solved have a matrix of the form 


al I 
B C 


(10.17) 


(where I is the identity matrix). Using the option IW0RK(9)=N/2 (where N is the 
dimension of the first order system) our codes do the first N/2 elimination sweeps 
analytically and the dimension of the linear system is halved. Without this option, 
the computing times for the codes RADAU5, RODAS, and SEULEX would be larger 
by a factor of about 3.0, 1.7, and 2.6, respectively (these numbers are for the BEAM 
problem at Tol = 10” 5 ). We did not include here the results of LSODE, for which 
we did not have an easy possibility for such a reduction. For the PLATE problem 
we also exploited the banded structure of df/dy and df/dv by putting ML JAC=16 
and MUJAC=16. 

For both problems the explicit code DOPRI5 was applicable too. A curious phe¬ 
nomenon arose for DOPRI5 at the PLATE problem: as expected, for low tolerance 
requirements ( Tol> 10“ 5 ), the code appeared to be restricted by stability, gave 
computing times independent of Tol and issued the message “the problem seems 
to be stiff”. But for more stringent tolerances the code was restricted by precision, 
with computing times unexpectedly high above those of the implicit code RADAU5. 
The analysis of Sect. IV. 15 for the Prothero & Robinson problem (15.1) gives an 
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Fig. 10.8. Work-precision diagrams for problems of dimension 2 to 80 
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Fig. 10.9. Work-precision diagrams for problems of dimension 80 to 32768 
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explanation for this fact. We see that stiff problems not only create loss of stability, 
but also loss of precision for explicit integrators. 

Especially for the BEAM problem, a problem with expensive linear algebra, the 
efficiency of the codes can be considerably increased by tuning the parameters. If, 
for the integration with RADAU5, we put 


WORK(3)=0.1 
WORK(4)=0.3 
WORK(5)=0.99 
WORK(6)=2. 


(Jacobian less often recomputed) 

(Newton iterations stopped earlier) 

(Step size changed less often, 
decreasing number of LU-decompositions) 


then the computing time decreases by a factor between 2 and 5. Fig. 10.9 shows 
the spectacular improvement of this “tuned” run. 


CUSP — the Jacobian of this problem is of the form 


/A 



\ 


to 



\D n 


Bn- i 

J 


c N a n 


(10.18) 


where A i ,B i ,C i1 D i are 3 x 3 matrices, and an efficient solution of the linear sys¬ 
tem needs a special treatment (see Exercise 1). However the considered methods, 
with the exception of the Rosenbrock methods, do not require an exact Jacobian. 
Therefore, an easy possibility for a considerable reduction of computing time is 
simply to use the codes in the banded version by putting ML=MU=3. The D x and 
D n are neglected and we obtain the computing times displayed in Fig. 10.9. If 
the Jacobian were treated as a full matrix, the computing times would increase by 
a factor of 8.3, 6.6, and 4.8 for the codes RADAU5, SEULEX, and LSODE, respec¬ 
tively (these numbers are for Tol = 10 -5 ). The explicit code RKC gives excellent 
results for low precision, whereas the results of DOPRI5 (more than 30 seconds) are 
outside of the picture. 


BRUSS — for this one-dimensional reaction-diffusion problem the linear algebra 
is done in the “banded” version with “analytical Jacobian”. The problem is very 
stiff (large diffusion constant and small Ax ) and an explicit method, such as DO- 
PRI5, would require close to 60000 steps of integration. The code RKC works well, 
although less efficiently than the stiff integrators. 

KS — the solution of this problem is sensitive with respect to changes in the initial 
values, a phenomenon already encountered in the LRNZ problem of Sect. II. 10. 
Similarly as there, the precision increases only for Tol beyond a certain threshold. 
The Jacobian of this problem is full. Numerical experiments revealed that the codes 
worked best when the Jacobian is replaced by a diagonal matrix with (qj ) 2 — (qj ) 4 
in its j th entry. Rosenbrock methods, which require an exact Jacobian, are not 
efficient here. The explicit codes RKC and DOPRI5 need too much computing time. 
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BECKDO — for this problem, the stiff codes (the only ones which work) require 
the solution of linear systems of the form 


/ u v T \ ( x\ _ f a 
\w Tj[y)-[b 


(10.19) 


where v, w, b are (n — 1)-dimensional vectors and T is a tri-diagonal matrix. 
Since the linear algebra routines are completely separated from the codes RADAU5, 
RODAS and SEULEX, it is easy to replace these routines by a special program which 
solves (10.19) efficiently as follows 


x = [a — v T T 1 b)/(u — v 
y = T~ l b — xT~ 1 w. 


T rp — 1 
1 w 


) 


( 10 . 20 ) 


It is not necessary to alter the stiff integrator itself. 

Fig. 10.9 shows that, as usual, RODAS is best for low tolerances and RADAU5 
is preferable for high precision. Not as usual is the fact that RODAS performs very 
badly for stringent tolerances. We explain this by the fact that the linear system 
(10.19) is sensitive to round-off errors, or, as Wilkinson would turn it, delivers a 
solution for a wrong Jacobian. Thus, the order of the Rosenbrock method drops 
to 1. 


BRUSS-2D — due to its large dimension (n = 2 • 128 2 = 32768), this problem 
makes no sense in full or even banded linear algebra. We therefore solved the 
linear equations (in the codes with separated linear algebra, see the correspond¬ 
ing remarks in the BECKDO problem) by FFT methods, taking into account only 
the (stiff) diffusion terms and neglecting the (in this problem non-stiff) reaction 
terms. The FFT codes used were those of Press, Flannery, Teukolsky & Vetterling 
(1986,1989) in the chapter on partial differential equations. A special advantage of 
the Radau method is here that the complex algebra, which is anyway used in FFT, 
crunches the complex eigenvalues of the Runge-Kutta matrix without further harm. 

For this problem, which is a typical parabolic partial differential equation with 
non-stiff nonlinearities, we have made a detailed comparison of the performances 
of the implicit code RADAU5, the “stabilized” explicit code RKC, and the explicit 
code DOPRI5, in dependence of the discretization parameter Ax = Ay = 1 /N and 
the diffusion parameter a (see Eqs. (10.15) and (11.10.14)). The results (number 
of function calls and computing times) are displayed in Table 10.1, where the best 
performances are displayed in boldface characters. We can see how the Olympic fire 
goes over from DOPRI5, which is best for low stiffness (aN 2 < 1), by increasing 
the stiffness first to RKC, and then (for aN 2 > 1000) to the implicit RADAU5 code. 
We also observe that the number of function evaluations is nearly independent of 
the stiffness for RADAU5, behaves like Const • y/a • N for RKC, and like Const • 
a-N 2 for DOPRI5. 


Comparisons Between Codes of the Same Type. Figs. 10.8 and 10.9, which are 
a sort of “Final Competition of Wimbledon”, contain only one code from each 
class of integration methods (Radau methods, Implicit Runge-Kutta, Rosenbrock, 
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Table 10.1. Function evaluations / computing times at Tol = 10 5 


RADAU5 

N = 16 

N = 32 

N = 64 

TV — 128 

A = 256 

a= 10 -3 
a = 10 -2 
a = 10 -1 
a = 1 

3372/19.8 

1286/7.7 

1150/6.8 

1195/7.8 

3233/84.9 

1322/36.2 

1131/30.9 

1199/33.0 

3271/413.5 

1295/167.4 

1227/172.3 

1247/177.3 

3290/2215.6 

1381/868.8 

1173/854.9 

1242/945.9 

3261/14902.1 

1380/6459.3 

1204/5664.9 

1258/5961.2 

RKC 

N = 16 

N = 32 

N = 64 

TV = 128 

TV = 256 

a = 10 -3 
a = 10 -2 
a = 10 -1 

a = 1 

2367/4.7 

1661/3.2 

1899/3.6 

4013/7.2 

2277/18.6 

1674/13.8 

2823/22.5 

7565/58.9 

2249/76.3 

2078/70.4 

5047/176.8 

14631/503.4 

2311/352.5 

3379/511.5 

9666/1446.2 

29022/4328.8 

2911/1912.0 

6259/4086.9 

18911/12312.2 

DOPRI5 

N — 16 

N = 32 

N = 64 

TV = 128 

TV = 256 

a = 10 -3 
a = 10 -2 
a = 10 -1 
a = 1 

976/2.0 

784/1.6 

4366/9.0 

42832/90.6 

1030/8.5 

1894/15.4 

17176/145.5 

171010/1505.8 

1408/48.5 

6976/240.6 

68446/2419.7 

683836/24362.7 

3286/509.4 

27478/4369.6 

273568/43982.2 

11464/7704.2 


and extrapolation methods). Following are some comparisons within each of these 
classes. 

Radau Methods. For a comparison of Radau methods of various orders (see also the 
results of Reymond (1989) in the first edition), we have written a code RADAUP, 
which allows to choose with the help of a method flag IW0RK(11)=3,5,7 to 
choose between 5 = 3,5, or 7 (i.e., between orders p = 5,9, or 13). The code 
is for 5 = 3 mathematically equivalent to RADAU5, but, due to a different coding, 
slightly slower. We can see in Fig. 10.10 how the higher order pays off for higher 
precision, but for lower precision arise problems due to large step sizes and bad 
convergence of the Newton iterations. 

Implicit Runge-Kutta Methods. It has for a long time been taken for granted that 
only DIRK and SDIRK methods could be implemented efficiently. Our experience 
shows that the diagonally implicit method SDIRK4, constructed in Section IV.6, 
gives rather disappointing results (see Fig. 10.11). An exception is the BEAM prob¬ 
lem with its, microscopically, highly oscillatory solutions. Since the code SDIRK4 
has not the option for “second order” linear algebra, we have also applied RADAU5 
without this option. The computing times for RADAU5 are therefore not the same 
as in Fig. 10.9. 

Rosenbrock Methods. There is usually not much difference between the perfor¬ 
mance of the different Rosenbrock methods (see Fig. 10.12). In spite of their larger 
number of stages, the codes RODAS5 (order 5) and RODAS (order 4) give often 
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the best results. Among the 4th order “classical” Rosenbrok methods of Table 7.2 
the best is in general “method 2” with its small error constant; it fails completely, 
however, on the Beam problem due to lack of A-stability. “Method 6” corresponds 
to the choice of coefficients which give an L -stable method. 

Extrapolation Methods. The code SODEX, which is based on an h 2 -extrapolation 
of the semi-implicit midpoint rule, is clearly superiour to SEULEX for low precision 
(see Fig. 10.13). The opposite situation appears for more stringent tolerances; here 
we observe an order reduction phenomenon, which is explained in Sect. VI.5 below. 
We have also included in these tests the results of the code EULSIM by Deuflhard, 
Novak & Poehle (poehle@sc.zib-berlin.de) which is another implementation of the 
extrapolated semi-implicit Euler method, with a different stepsize sequence. 

Chebyshev Methods. During the final realization of these experiments we have 
received a code DUMKA3 (written by A. Medovikov, nucrect@inm.ras.ru) which 
implements an extension of the optimal Chebyshev methods of Lebedev (see Sect. 
IV.2) to third order. This code is still in a very experimental stage, but the results, 
presented in Fig. 10.14, are very promising. 



Partitioning and Projection Methods 

Most codes for solving stiff systems ... spend most of their time 
solving systems of linear equations ... 

(Watkins & HansonSmith 1983) 

Further spectacular reductions of the work for the linear algebra are often possible. 
One of the oldest ideas is to partition a stiff system into a (hopefully) small stiff 
system and a large nonstiff part, 

y' a = fa(y^yb) ( stiff ) (1021) 

Vb= hiy^Vb) (nonstiff), 

so that the two systems can be treated by two different methods, one implicit and 
the other explicit (e.g. Hofer 1976). The theory of P -series in Sect. II. 14 had its 
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origin in the study of the order properties of such methods. A difficulty of this 
approach is, of course, to decide which equations should be the stiff ones. Further, 
stiffness may affect subspaces which are not parallel to the coordinate axes. We 
shall therefore turn our attention to procedures which do not adapt the underlying 
numerical method to the partitioning, but the linear algebra only. An excellent 
survey of the older literature on these methods is given by Soderlind (1981). The 
following definition describes an especially promising class of problems: 


Definition 10.1 (Bjorck 1983, 1984). The system y f = f(x, y) is called separably 
stiff at a position x 0 ,y 0 if the Jacobian J = |^(a; 0 ,y 0 ) possesses k < n eigen¬ 
values X 1 ,..., X k such that 


min | A J » 

1< Z <A: 


max |A- 

fc+l<i<n 


The eigenvalues A x ,..., X k are called the stiff eigenvalues and 

M |AJ / max |AJ (10.22) 

l< z <« «+l<z<n 

the relative separation. The space D spanned by the stiff eigenvectors is called the 
dominant invariant subspace. 


For example, the Robertson problem (10.2) possesses only one stiff eigenvalue 
(close to —2000), and is therefore separably stiff with k = 1. The CUSP problem 
(10.8’) of dimension 96 has 32 large eigenvalues which range, except for tran¬ 
sient phases, between —20000 and —60000. All other eigenvalues satisfy approx¬ 
imately |A| < 30. This problem is, in fact, a singular perturbation problem (see 
Sect. VI. 1), and such problems are all separably stiff. The other large problems of 
this section have eigenvalues scattered all around. A.R. Curtis’ study (1983) points 
out that in practical problems separably stiff problems are rather seldom. 


The Method of Gear and Saad. Implicit methods such as (transformed) Runge- 
Kutta or multistep formulas require the solution of a linear system (where we de¬ 
note, as usual in linear algebra, the unknown vector by x ) 

Ax — b where A = I — J (10.23) 

h^y 

with residual r = b — Ax. We choose k (usually) orthogonal vectors q 1 ,..., q k 
in such a way that the span {q -^,..., q k } = D is an approximation to the dominant 
subspace D , and denote by Q the k x n -matrix formed by the columns q ■, 

Q = «*)• (10.24) 

There are now several possibilities for replacing the solution x of (10.23) by an 
approximate solution x £ D. One of the most natural is to require (Saad 1981, 
Gear & Saad 1983; in fact, Galerkin 1915) that the residual of x , 

r = b — Ax = A{x — x ), 


(10.25) 
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be orthogonal to D , i.e., that Q T (b — Ax) = 0. If we write x in the basis of 
(10.24) as x = Qy , this yields 

Hy — Q T b, (10.26) 

where 

H = Q t AQ or QH = AQ, (10.27) 

which means that we have to solve a linear system of dimension k with matrix 
H . A particularly good choice for D is a Krylov subspace spanned by an arbitrary 
vector r 0 (usually the residual of a well chosen initial approximation x 0 ), 

D = span {r 0 , Ar 0 , A 2 r 0 ,A k ~ l r 0 }. (10.28) 

The vectors (10.28) constitute the sequence created by the well-known power me¬ 
thod. Therefore, in the case of a separably stiff system, as analyzed by D.J. Higham 
(1989), the space D approaches the space D extremely well as soon as its di¬ 
mension is sufficiently high. In the Amoldi process (Arnoldi 1951) the vectors of 
(10.28) are successively orthonormalized (Gram-Schmidt) as 

9i = r o/ll r oll 

% = Mi~h 11 q 1 , q 2 = q 2 /h 21 with ft 2 i = ll§ 2 ll 
and so on, and we see that 

Aq^ — ^ 21^2 

Ml = ft 32<?3 + ^22?2 + ft 12?l (10.29) 


which, compared to (10.28), shows that H is Hessenberg. For A symmetric, H is 
also symmetric, hence tridiagonal, so that the method is equivalent to the conjugate 
gradient method. 

Two features are important for this method: Firstly, the matrix A need never 
be computed nor stored. All that is needed are the matrix-vector multiplications in 
(10.29), which can be obtained from the “directional derivative” 

Jv « [f(x,y + Sv) -f(x,y)]/5. (10.30) 

Several people therefore call such methods “matrix-free”. Secondly, the dimension 
k does not have to be known: one simply computes one column of H after the other 
and periodically estimates the residual. As soon as this estimate is small enough 
(or k becomes too large) the algorithm stops. We also mention two variants of the 
method: 

1. (Gear & Saad 1983, p. 595). Before starting the computation of the Krylov 
subspace, perform some initial iteration of the power method on the initial vector 
r 0 , using either the matrix A or the matrix J. Lopez & Trigiante (1989) report 
excellent numerical results for this procedure. 

2. Incomplete Orthogonalization (Saad 1982). The new vector Aq ■ is only 
orthogonalized against the previous p vectors, where p is some small integer. This 
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makes H a banded matrix and saves computing time and memory. For symmetric 
matrices, the ideal choice is of course p = 2 , for matrices more and more unsym- 
metric p usually is increased to 10 or 15. 

The EKBWH-Method (this tongue-twister stands for Enright, Kamel, Bjorck, 
Watkins and HansonSmith). Here, the matrices A (and J) in (10.23) are replaced 
by approximations 

A=^-I-J (10.31) 

where J should approach J sufficiently well and the matrix A be relatively easy 
to invert. J is determined as follows: Complete (theoretically) the vectors (10.24) 
to an orthogonal basis (Q, Q) of R n . In the new basis J becomes 

{‘PjHQ ,«) = (£; ?“) < 1032 > 

and we have 

Q t J Q = T u . (10.33) 

If span Q = D approaches D , then T n will contain the stiff eigenvalues and T 21 
will tend to zero. If D — D exactly, then T 21 = 0 and (10.32) is a block-Schur de¬ 
composition of J. For separably stiff systems ||T 22 1| will become small compared 
to (hj)- 1 and we define 

J = (Q,Q) T ^ 2 ) (§r) = Q{T n Q T + t 12 q t ) (10 = 32) qq t j. 

This shows J to be the orthogonal projection of J onto D . The inverse of A is 
computed by developing (I — B)- 1 = I + B + B 2 + ... as a geometric series 

A- 1 =h 1 {l-h 1 QQ T jy 1 

= ^(1 + h 1 QQ T J + TiVQ Q t JQ Q t J +...) 

Tl1 (10.34) 

= h7(l + Qih'fl + /iVT n + /» 3 7 3 2 ? 1 +.. .)Q t J) 

= h-y(l + Q(j^-I-T 11 )~ 1 Q t J) 

which only requires the solution of the “small” system with matrix (I/hj — T n ) 
(the last expression is called the Sherman-Morrison-Woodbury formula). 

Choice of Q: 

— Bjorck (1983) computes the precise span of D , by Householder transforms 
followed by block- QR iterations. For separably stiff systems the block T 21 con¬ 
verges to zero linearly with ratio ^ _1 so that usually 2 or 3 iterations are sufficient. 
A disadvantage of the method is that an estimate for the dimension k of D must 
be known in advance. 
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— Enright & Kamel (1979) transform J to Hessenberg form and stop the 
transformations when ||T 21 II + 11^22 II become sufficiently small (remark that T 21 
is non zero in its last column only). Thus the dimension k can be discovered 
dynamically. Enright & Kamel combine the Householder reflexions with a pivoting 
strategy and repeated row & column permutations in order to make T 22 small as 
fast as possible. It was first observed numerically (by Carlsson) and then shown 
theoretically (Soderlind 1981) that this pivoting strategy “needs some comments”: 
if we start from (10.32), by knowing that 



is Hessenberg in its first k columns, (with h 21 ^ 0, h 32 ^ 0,...) and do the analysis 
of formulas (10.29) backwards, we see that the space D for the Enright & Kamel 
method is a Krylov subspace created by q 1 (D.J. Higham 1989). Thus only the first 
permutation influences the result. 

— Watkins & HansonSmith (1983) start from an arbitrary Q(°) followed by 
several steps of the block power method 


JQ« = qO+D.rO+1) (10.35) 


where R''+ 1 * re-orthogonalizes the vectors of the product JQ (*). A great advan- 
tage of this procedure is that no large matrix needs to be computed nor stored. The 
formulas (10.35) as well as (10.34) only contain matrix-vector products which are 
computed by (10.30). The disadvantage is that the dimension of the space must be 
known. 

Stopping Criteria. The above methods need a criterion on the goodness of the 
approximation J to decide whether the dimension k is sufficient. Suppose that we 
solve the linear equation (10.23) by a modified Newton correction which uses A 
as “approximate Jacobian” 


x = x 0 + A 1 (b — Ax 0 ), 

then the convergence of this iteration is governed by the condition 

q{I — A -1 A) = q{A~ x [A — A)) = giA- 1 (J - J)) < 1. (10.36) 


A reasonable condition is therefore that the spectral radius q of A _1 (J — J) is 
plainly smaller than 1. Let us compute this value for the Bjork method (T 21 =0): 
since the eigenvalues of a matrix C are invariant under the similarity transforma- 
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tion T _ 1 CT, we have 

q(A-\J-J))=q 


= Q 

= Q 



xxx 

0 hjl 


0 

0 

"0 
, o 



0 xxx 
0 h~fT 22 


e(hlT 22 ). 


In practice, a condition of the form 


\\hyT 22 1| < 1, (10.37) 

where || • || is usually the Frobenius norm yjYlij a lj > ensures a reasonable rate of 
convergence. For an analogous condition in the Enright-Kamel case see Exercise 3 
below. 


Exercises 


1. (The red-black reduction). The Jacobian matrix of the (periodic) cusp catastro¬ 
phe model ( 10 . 8 ’) is of the form 


/ A B 1 Cl \ 

c 2 a 2 b 2 


' ^2m 


c 


2m —1 


Am-1 

C 2m 


^2m-l 1 

Am ' 


(10.38) 


where A i ,B i ,C i are (3 x 3)-matrices. Write a solver which solves linear 
equations with matrix (10.38) using the “red-black ordering reduction”. 
This means that , A 3 , A 5 ,... are used as (matricial) pivots to eliminate 
C 2 , C A ,..., B 2 , # 4 ,... above and below by Gaussian block-elimination. Then 
the resulting system is again of the same structure as (10.38) with halved di¬ 
mension. If the original system’s dimension contains 2 k as prime factor, this 
process can be iterated k times. Study the increase of performance which this 
algorithm allows for the RADAU5 and Rosenbrock codes on model (10.8’). The 
algorithm is also highly parallelizable. 


2. Show by numerical experiments that the circular nerve (10.8’) loses its limit 
cycle when the diffusion coefficient D becomes either too small (the message 
does not go across the water fall) or too large (the limit cycle then melts down 
across the origin). 


3. (Stopping criterion for Enright & Kamel method; D.J. Higham 1989). Sup¬ 
pose that the matrix J has been transformed to partial Hessenberg form (see 
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(10.32)) 


§r) J (Q’Q) 


k 

n — k 


k n — k 

H T 12 \ 

(0 6 ) T 22 ) 


where H is upper Hessenberg and b a column vector. Show that the criterion 
(10.36) then becomes 

g(h^fB) < 1 


where 



k — 1 
0 
0 


1 + n — k 

—h'yH~ l T 12 {b T 22 ) 

(6T 22 ) 


with H = (I — h^H) . Since g(B) is the same as the spectral radius of its 
lower 1 + n — k by 1 + n — k principal submatrix, a sufficient condition for 
convergence is 

IM V / ll T 22l| 2 + l|6|| 2 + l|j/|| 2 <l 

where y T is the k -th row of the matrix —hjH ~'t i2 ( 6 T 22 ). 
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He who loves practice without theory is like the sailor who boards 
ship without a rudder and compass and never knows where he may 
be cast. (Leonardo da Vinci 1452-1519, 

cited from M. Kline, Mathematical Thought ... 1972, p. 224) 


The stability analysis of the preceeding sections is based on the transformation of 
the Jacobian J « df/dy to diagonal form (see Formulas (2.5), (2.6) of Sect. IV.2). 
Especially for large-dimensional problems, however, the matrix which performs 
this transformation may be badly conditioned and destroy all the nice estimations 
which have been obtained. 



( 11 . 1 ) 


( 11 . 2 ) 


This matrix has all eigenvalues at —A and the above spectral stability analysis 
would indicate fast asymptotic convergence to zero. But neither the solution of 
(11.1), which just represents a travelling wave, nor the solution of (11.2), if the 
dimension becomes large, have this property. So our interest in this section is to 
obtain rigorous bounds for the numerical solution (see (2.3)) 

y m+1 =R(hA)y m (11.3) 

in different norms of R n or C n . Here R(z) represents the stability function of the 
method employed. We have from (11.3) 

l|y m +ill< 11^)11-l|y m II (H-4) 

(see Volume I, Sect. 1.9, Formula (9.10)), and contractivity is assured if 

11^)11 < i. 
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Euclidean Norms (Theorem of von Neumann) 

People in mathematics and science should be reminded that many 
of the things we take for granted today owe their birth to perhaps 
one of the most brilliant people of the twentieth century — John 
von Neumann. 

(John Impagliazzo, quoted from SIAM News September 1988) 

Let the considered norm be Euclidean with the corresponding scalar product de¬ 
noted by (•,•). Then, for the solution of y ' = Ay we have 

= ^(y,J/) = 2Re (y, y') = 2Re (y, Ay), (11.5) 

hence the solutions are decaying in this norm if 

Re(y,Ay) <0 for all ye C n . (11.6) 

This result is related to Theorem 10.6 of Sect. 1.10, because 

Re(y,Ay) < n 2 (A)\\y\\ 2 , (11.7) 

where p 2 (A) is the logarithmic norm of A (Eq. (10.20) of Sect. 1.10). 

Theorem 11.2. Let the rational function R(z ) be bounded for Rez < 0 and as¬ 
sume that the matrix A satisfies (11.6). Then , in the matrix norm corresponding to 
the scalar product we have 

||J?(A)||< sup |i?(z)|. (11.8) 

Re z<0 


Remark. This is a finite-dimensional version of a result of J. von Neumann (1951). 
A short proof is given in Hairer, Bader & Lubich (1982). The idea of the following 
proof is due to M. Crouzeix (unpublished). 

Proof, a) Normal matrices can be transformed to diagonal form by a unitary 
matrix Q (see Exercise 3 of Section 1.12). Hence, A = QDQ*, where D = 
diag^!,..., A n }. In this case we have 

||1?(A)|| = \\QR(D)Q*\\ = ||JJ(S)|| = . max |i?(A,)|, 

t=l 

and (11.8) follows from (11.6), because the eigenvalues of A satisfy Re A,.<0. 
b) For a general A we consider the matrix function 

A(u) = ^(A + A*) + \(A-A*). 

We see from the identity 

(v, A{uo)v) = a;Re (v, Av) + i Im (u, Av) 

that A(tjj) satisfies (11.6) for all u> with Rea; > 0, so that also the eigenvalues of 
A{lo) satisfy Re A (a;) <0 for Rea; > 0. Therefore, the rational function 

<p(w) = {u,R(A(lo))v) 
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O, v fixed) has no poles in Re<x? > 0. Using A(l) = A we obtain from the maxi¬ 
mum principle that 


(u,R(A)v) =v?(l) <supip(iy) < sup ||R(A(i?/))|| ||u|| ||w 

yGffi yGffi 

< sup |.R(z)||M||M|. 

Re z<0 


(11.9) 


The last inequality of (11.9) follows from part (a), because A(iy) is a normal 
matrix (i.e., A(iy)A(iy)* — A(iy)* A(iy)). Formula (11.8) is now an immediate 
consequence of (11.9) and of the fact that ||C|| = su P|| u ||<i,|| v ||<i Cv). □ 


Corollary 11.3. If the rational function R(z ) is A-stable, then the numerical 
solution y n+1 = R(hA)y n is contractive in the Euclidean norm (i.e., ||y n+1 || < 
||y n ||), whenever (11.6) is satisfied. 

Proof. A-stability implies that max |i2(z)|<l. □ 

Re z<0 


Corollary 11.4. If a matrix A satisfies Re (v. Av) < i/J|-(j|| 2 for all v G C n , then 

||R(^)||< sup \R(z)\. (11.10) 

Re z< v 

Proof. Apply Theorem 11.2 to R(z) — R(z + v) and A — A — vl. □ 


Error Growth Function for Linear Problems 


Guided by the above estimate, we define 

ip R (x):= sup |.R(z)|. ( 11 . 11 ) 

Re zKx 


This function is called error growth function (for linear problems). It is continuous 
and monotonically increasing. If R(z) is analytic in the half-plane Re z < x , the 
maximum principle implies that 


( Pr{ x ) — SU P |-R(® + iy)\- 

y€K 

Examples. 

1. Implicit Euler method: 


1 


R(x) 


l-z 


R(z) = 


<p R (x) = 


oo 


if —oo < x < 1 
if 1 < x . 


( 11 . 12 ) 
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2. The stability function of the 0-method (or of a one-stage Rosenbrock method): 

|i?(oo)| ifz< 4 0 


R(z 


1 + (1 — 6)z 
1 — Oz 


¥r( x ) = S R{x) 


if £o < x < 1/6 (11.13) 
if I/O < x , 


where £ 0 = (1 — 20 )/( 20 (l — 0 )) for 0 < 0 < 1 and £ 0 = —oo for 0 > 1 . 


3. The (0,2)-Pade approximation: 


R(z) 


1 

l-z + z 2 /2 


Vr( x ) 


' R(x) 

1 

1 — X 

, oo 


if —oo < x < 0 

if 0 < x < 1 (11.14) 

if 1 < x . 


4. The (1,2)-Pade approximation R(z) 


1 + z/Z 

1 — 2z/3 + z 2 /6 


<Pr{x) 


' 1^)1 

\J^\2x 2 + 12a; + 9 + lOx + 7 
2(2 — x) 

, oo 


where £ 0 = — 6 — 3\/l0 • 


if —oo < x < £ 0 

if £ 0 < x < 2 
if 2 < x , 

(11.15) 


5. The (2,2)-Pade approximation R(z) 


<Pr(x) = 


1 


2x -f- \/9 3x 2 

3 — x 


oo 


1 + z/2 + z 2 /12 

= 1 — 0/2 + 2r 2 /12 

if —00 < x < 0 

if 0 < x < 3 
if 3 < x . 


(11.16) 


The next two theorems give some general results on the shape of p R (x). 

Theorem 11.5. Let R{z) be an A-stable approximation to e z of exact order p, 
i.e., R(z) = e z — CzP+ 1 -f- 0(zP+ 2 ) with ( 7 ^ 0 . If additionally \R(iy)\ < 1 for 
y 7^ 0 and |i 2 (oo)| < 1, then we have 

a) if p is odd 

<Pr(x) = e x + (9(x p+1 ) for x —> 0. (11.17) 

b) if p is even we have (11.17) only for ( — 1 )p! 2 Cx > 0, otherwise 

<Pr( x ) = e x + 0(x r+1 ) for x — y 0 (11.18) 

for some positive rational number r <p/2. 
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Proof. The assumptions imply that for x -» 0 the maximum of { \R(x + iy)\] yGl} 
must be located near the origin. We further observe that it must lie within the order 
star A = {z£ C; \R{z)\ > \e z \}. If p is odd, the order star consists of p + 1 
sectors near the origin (Lemma 4.3) and, asymptotically for z -» oo , all elements 
of A satisfy \z\ < D\x\, D < oo. Therefore 

\R(z)\ = e x + 0{\z\r +1 ) = e x + 0{x p+1 ) for x -> 0. 

The same argument applies if p is even and (—1 )p/ 2 Cx > 0. In the remaining case 
(p even and (—1 )p/ 2 Cx < 0) the maximum of {\R(x + iy)\] y £ R} is attained 
near the imaginary axis and a more detailed analysis is necessary (Hairer, Bader & 
Lubich 1982). □ 


Theorem 11.6 (Hairer & Zennaro 1996). For an A-stable approximation to e z the 
function p R {x) is superexponential, i.e., it satisfies p R (0) = 1 and 

Vr{ x i)Vr{ x 2) <<Pr( x i + x 2) (11.19) 

for all x x , x 2 having the same sign. 

Proof. A-stability is equivalent to <p R (0) = 1. It therefore remains to verify 
(11.19). Let x x and x 2 be fixed (both <0 or both >0) and assume p R {x 1 + x 2 ) < 
oo. The idea is to consider the rational function 

S(z ) = R(a — z)R(z) 

where a £ C is a parameter satisfying Rea < x x + x 2 . Due to A-stability and 
<p R (x i +x 2 ) <oo, S(z) is analytic on the stripe 0 <Re 2 r<x 1 +x 2 (or x 1 +x 2 < 
Rez < 0), and its modulus is bounded by p R {x 1 +x 2 ) on the border. By the 
maximum principle we therefore have for all z in the considered stripe 

|i?(a - z)R(z )| < <p R (x 1 + x 2 ). 

We now choose z on the line R ez = x 2 in such a way that \R(z)\ becomes maxi¬ 
mal; then, we choose a on the line Re a — x x + x 2 (i.e., Re (a — z) — x x ) such that 
\R(a — z)\ becomes maximal (eventually one has to consider limits). This proves 
(11.19). □ 


Property (11.19) has an interesting practical interpretation. Consider a numer¬ 
ical solution y n obtained with variable step sizes. Repeated application of (11.4) 
and Corollary 11.4 implies 

m — 1 

hmW - ( n V R ( h kA) ■ ( 11 . 20 ) 

k =0 

if the problem y' = Ay satisfies (11.7) with p = P 2 ^) • For p < 0 and for an A- 
stable method all factors p R (h k p) are smaller than one. If in addition |R(oo) | < 1, 
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these factors are close to one only for h k -» 0. The inequality (11.19), written as 

Vr{KiA VR^k+lP-) < v{( h k + h k+ i ) m ) , 

means that replacing two consecutive steps by one large step of size h k + h k _ { _ 1 
increases the upper bound (11.20). Therefore, after combining several consecutive 
steps (if necessary), we may assume h k > h > 0 for all k . This implies that HyJI< 
g ™\\y 0 \\ with g — ip R {hfji) < 1. Hence, for any mesh x 0 ,x 1 ,... with x m ->• oo, we 
have asymptotic stability, i.e., \\y m || —>• 0 for m —>• oo . Under additional restrictions 
on the step size, sharper bounds on \\y m || can be obtained (Exercise 3). 


Small Nonlinear Perturbations 

The above estimates, valid only for linear autonomous equations y f = Jy, can 
be extended to problems with small nonlinear perturbations, so-called semi-linear 


problems 


S-f 

+ 

1! 

( 11 . 21 ) 

where 


{y, Jy) <nh\\ 2 

( 11 . 22 a) 


h( x ,y)-g( x ,z)\\ <L\\y-z\\ 

( 11 . 22 b) 


with L assumed to be small. 

Here, in the presence of nonlinearities, stability properties are obtained by es¬ 
timating the distance of two neighbouring solutions y(x) and y(x). Instead of 
(11.5) we therefore have 

^11 y{ x ) - y(aOII 2 = 2 (y' -y’,y- y) 

which gives, after inserting (11.21) for y f and y ', using the Cauchy-Schwarz in¬ 
equality and the estimates ( 11 . 22 ) 

-^\\y( x )-y( x )\\ 2 < 2 (^ + L ) \\y( x ) - y{ x )\\ 2 ■ (11.23) 

We thus have contractivity whenever \i + L < 0 . 

We now want to establish the same property for the numerical solutions. In 
principle, these estimates can be carried out for all methods of this chapter; how¬ 
ever, since the subsequent sections will deal with so many nice properties of im¬ 
plicit Runge-Kutta methods, we shall concentrate here on Rosenbrock methods. 

Example 11.7. Consider the 1-stage Rosenbrock method 


(I ~'fhJ)k 1 =hf(x 0 ,y 0 ) 

Vi =y 0 + k i 


(11.24) 
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with 7 > 0 as a free parameter. Its stability function is 


R(z) 


1 + (1-7 )z 

1 — 7 ^ 


and we have A-stability for 7 > 1/2. Application of (11.24) to (11.21) yields 

y 1 = R(hJ)y 0 + (J — 'yh,J)~ 1 hg(x 0 ,y 0 ). (11.25) 


From von Neumann’s theorem (Corollary 11.4) we obtain || (/ — 7 hJ)~ l || < (1 — 
7 hn)- 1 and \\R(hJ)\\ < p R (hp) with <p R given in (11.13). If we take a second 
numerical solution y x , also defined by (11.25), its difference to y l can be estimated 
by 

to,-Sill < (k(M + tTs ) to “ “ bo “ 

whenever £ 0 < hp < I/7 with £ 0 given in (11.13). Therefore contractivity occurs 
for fjL + L < 0, as desired. 


For the general Rosenbrock method (7.4) applied to problem (11.21) 


k i = hg(x 0 + c t h, Ui ) + hJy 0 + hJ 

3 = 1 

2—1 8 

u i = yo + Yl a rj k j’ yi=y 0 + ^2 b i k i 

j= 1 i= 1 


we easily find the following analogue of the variation of constants formula. 


Theorem 11.8. The numerical solution of a Rosenbrock method applied to (11.21) 
can be written as 


j/j = R(hJ)y 0 + h^2 b i{hJ)g(x 0 + c { h, u { ) 

Ti ( 1L26 ) 

u i = + h ^2 a ij( h J)9(x 0 + c j h > u j)> i = 

3 = 1 

Here R(z) is the stability function, Rfz) are the so-called internal stability func¬ 
tions and bfz), a i j(z) are rational functions whose only pole is I/7 and which 
satisfy 6-(00) = 0, a^(oo) = 0. □ 


Remark. For many classes of linearly implicit methods (e.g., the methods of van 
der Houwen (1977), Friedli (1978), Strehmel & Weiner (1982), etc.), the numeri¬ 
cal solution can be expressed by (11.26) with certain rational functions. Thus the 
following analysis can be applied to these methods as well. 
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We now take a second numerical solution y 0 , u i , y x (again defined by (11.26)), 
take the difference to y 1 and apply the triangle inequality. Using von Neumann’s 
theorem (Corollary 11.4) the assumptions (11.22) then imply 


Wvi - Vi II < ( fR( h ^)\\yo-y 0 \\ + hL Yl^bi( h f J ‘)\\^i- u i 


i—1 
i — 1 


(11.27) 


< i PR i ( h i j -)\\yo-yo\\ + hL J2^^j( h ^ e ^j~ u j\\- 


j=1 


Inserting the second inequality of (11.27) repeatedly into the first one yields 


Theorem 11.9. Under the assumption (11.22) the difference of two numerical so¬ 
lutions of (7.4) can be estimated by 

Mi — Vi II < ( t PR(hfi) + chL)\\y 0 -y 0 || (11.28) 

where <p R (x) is given by (11.11) (R(z ) is the stability function of (7.4)) and c is 
a constant depending smoothly on hL and hp but not on || J\\ (which represents 
the stiffness of the problem). □ 


This estimate shows numerical contractivity whenever (p R (hp) + hL * < 0. In 
Theorem 11.5 we have shown under certain assumptions that <p R (x) = l + x + 
o(x), so contractivity holds essentially for p + L* < 0. In any case we have that 
A -stability implies 

lift — Vi II < (1 + hC*) ||y 0 —1/ 0 || 

for hp < Const. Here, C* is a constant independent of the stiffness of ( 1 1.21). 

Remark. Since the rational functions b i and a- in (11.26) vanish at infinity, also 
(1— 7 h^bfhJ) and (1— 7 hJ)a i -(hJ) are uniformly bounded for J satisfying 
( 1 1.22) and for hp <C < 7 -1 . Instead of the second condition of (11.22) we may 
therefore require that 

\\{I--lhJ)~ 1 h(g(x,y)-g(x,z))\\ <%-z||, (11.29) 

and the statement of Theorem 11.9 holds with hL replaced by t. Observe that the 
assumption (11.22) implies (11.29) with £ = hL/( 1 — 7 hp). However, in some 
special situations the number £ may be significantly smaller than hL . Related 
techniques are used by Hundsdorfer (1985) and Strehmel & Weiner (1987) to prove 
contractivity and convergence for linearly implicit methods. Ostermann (1988) ap¬ 
plies these ideas to nonlinear singular perturbation problems, where hL = 0(he~ l ) 
with some very small e (e h ), but £ can be bounded independently of e _1 . 
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Contractivity in || and || \\ x 


The study of contractivity in general norms has been carried out mainly by Spijker 
(1983, 1985) and his collaborators. Similar techniques of proof can be found in 
Bolley & Crouzeix (1978), where a related problem (monotonicity) is treated. 

The following theorem gives a condition which is necessary for contractivity 
just for the special equation (11.2) and for one of the two norms || • or || • ||j. 
Later, the same condition will also turn out to be sufficient for general problems 
and all norms. 


Theorem 11.10. Let A be the n-dimensional matrix of (11.2) with fixed A > 0. 
For a rational function R(z) satisfying R( 0) = 1 we have 

|| J R(/i^4)|| 00 < 1 in all dimensions n = 1,2,... (11.30) 

only if 

R u) {x)>0 for x 6 [—Xh, 0] and j = 0,1,2,... (11.31) 

(The same statement is true , if || • in (11.30) is replaced by || • ||j ). 


Proof. We put h = 1 and write A = — A / + AiV, where N is anilpotent matrix. In a 
suitable norm, ||iV|| is arbitrarily small and therefore we have by Taylor expansion 
and N n = 0 

J=0 J - 

This means (e.g. for n — 4) 

(R{-\) XR'(-X) A) £#"(-A)\ 

R{-X) XR'(-X) ^R"{-X) 

R{-X) XR'(-X) 

R(-X)J 

Application of Formula (1.9.IT) shows that ^(A)!^ < 1 (or ||i2(A)|| 1 < 1) is 
equivalent to 


R{A) = 


V 


n — 1 


I x j 


EI«“(-a )| 7 <i. 

J=0 

If (11.32) is valid for all n > 1, the series 

j> o 

is absolutely convergent, and therefore we have 

A-?' 


(11.32) 


(11.33) 


,A> 


1 = R( 0) = ^ I- r0, (- a )It|- < 1 


j> 0 


j> 0 
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implying A) > 0 for all j > 0. Since the Taylor expansion 

fi 0) W = g ffW( _ A) (i+^ 

consists for x > — A only of non-negative terms, we have (11.31). 


□ 


The next theorem shows that condition (11.31) is sufficient for contractivity 
in arbitrary norms. It can readily be applied to the system (11.2), since its matrix 
satisfies ||A+A/y^ = A. 

Theorem 11.11. Consider an arbitrary norm and let A be such that for some 
A > 0 , 

||A + AJ|| < A. (11.34) 

If the stability function of a method satisfies R( 0) = 1 and 

R^(x)> 0 for xe[-g, 0 ] and j = 0 , 1 , 2 ,... ( 11 . 35 ) 

then we have numerical contractivity ||iJ(/i7l)|| < 1 , whenever h\< g. 

Proof We again put h — 1. Since for 0 < A < g we have Rd) (—A) > 0 for all j , 
the function 

= (11.36) 

j> 0 J ' 

satisfies \R{z)\ < R(— A + r) for all complex z in the disk \z + A| < r . This prop¬ 
erty and (11.35) imply that no pole of R(z) can lie in \z + A| < A, so that the 
radius of convergence of (11.36) is strictly larger than A. Consequently we have 
from (11.34) 

R(A) = ^R^(~A) (11 . 37) 

j> o J ' 

The triangle inequality applied to (11.37) yields the conclusion. □ 


Study of the Threshold Factor 


Definition 11.12. The largest q satisfying (11.35) is called the threshold-factor of 

R(z). 

Example 11.13. The implicit Euler method, for which 


R^ix) = 


j = 0,1,2,..., 


(1 — x)i +1 ’ 

satisfies (11.35) for all g > 0. It possesses a threshold-factor g = oo. 
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Example 11.14 (Threshold-factor for Pade-approximations). The derivatives of the 
polynomials 

Rko( z ) = 1 + + + 

are easily calculated; the most dangerous one is 1 + z, therefore g = 1 for all k . 

The Pade approximations R kl (z) possess one simple pole only, so they can be 
written in the form 

R kl {z) = j —+ polynomial in 2 , 

which has only a finite number of derivatives which can change sign (see Example 
11.13). The numerical values obtained are shown in Table 11.1. 

The functions R k2 (z) possess no real pole (see Sect.IV.4). But the property 
\R(z)\ < R(—g + r) for \z + < r (see proof of Theorem 11.10) means that the 

maximum of |jR(j2t) | on the circle with center — g and radius r is assumed to the 
right on the real axis. For increasing r , the first pole met by this circle must there¬ 
fore be real and to the right of — g. This is not possible here and therefore the 
approximations R k2 (z) never satisfy property (11.35). This is indicated by an 
asterisk (*) in Table 11.1. 

All further values of Table 11.1 were computed using the decomposition of 
R(z) into partial fractions and are cited from Kraaijevanger (1986) and van de 
Griend & Kraaijevanger (1986). 


Table 11.1. Threshold-factors of Pade approximations 


k 

0 

1 

2 

3 

4 

5 

6 

3 = 0 

— 

1 

1 

1 

1 

1 

1 

3 = 1 

oo 

2 

2.196 

2.350 

2.477 

2.586 

2.682 

3 = 2 

* 

* 

* 

* 

* 

* 

* 

3 = 3 

0.584 

1.195 

1.703 

2.208 

2.710 

3.212 

3.713 

3= 4 

* 

* 

* 

* 

* 

* 

* 

3 = 5 

0.353 

0.770 

1.081 

1.424 

1.794 

2.185 

2.590 


It is curious to observe that in this table the methods with the largest threshold- 
factors are precisely those which are not A-stable. An exception is the implicit 
Euler method (fc = 0,j = l) for which g — oo. 
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Absolutely Monotonic Functions 

... on peut definir la fonction e 1 comme la seule fonction absol- 
ument monotone sur tout le demi-axe negatif qui prend a l’origine, 
ainsi que sa derivee premiere [sic] la valeur un. 

(S. Bernstein 1928) 

A thorough study of real functions satisfying (11.35) was begun by S. Bernstein 
(1914) and continued by F. Hausdorff (1921). Such functions are called absolutely 
monotonic in [—£,0]. Later, S. Bernstein (1928) gave the following characteri¬ 
zation of functions which are absolutely monotonic in (-oo, 0] (see also D.V. Wid- 
der 1946). 

Theorem 11.15 (Bernstein 1928). A necessary and sufficient condition that R(x) 
be absolutely monotonic in (— oo, 0] is that 

poo 

R(x) — / e xt da(t ), (11.38) 

Jo 

where a(t) is bounded and non-decreasing and the integral converges for — oo < 
x < 0. 

This is a hard result and the main key for the next two theorems. It does not 
seem to permit an elementary and easy proof. We therefore refer to the original 
literature, S. Bernstein (1928). For a more recent description see e.g. Widder 
(1946), p. 160. From this result we immediately get the “limit case A -y oo” of 
Theorem 11.11, which also holds for an arbitrary norm. 

Theorem 11.16. Let R(x) be absolutely monotonic in (— oo, 0], R{ 0) = 1 and A 
a matrix with non-positive logarithmic norm p(A) < 0, then 

||i?(A)|| < 1. 

Proof By Theorem 1.10.6 we have for the solution y(x) = e Ax y Q of y f = Ay that 
||t/(a:) || < || y 0 | |, hence also \\e Ax \\ < 1 for x > 0. The statement now follows from 

p oo p OO p oo 

\\R(A)\\ = \\ e At da(t)\\ < I \\e At \\da(i) < da(t) = R(0) = 1 

Jo Jo Jo 

since a(t) is non-decreasing. □ 


The following result proves that no Runge-Kutta method of order p > 1 can 
have a stability function which is absolutely monotonic in (-oo, 0 ]. 

Theorem 11.17. If R(x) is absolutely monotonic in (—oo, 0] and 
R(x) — 1 + x + x 2 /2 + 0(x z ) for x->0, 

then R(x) — e x . 
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Proof (Bolley & Crouzeix 1978). It follows from (11.38) that 


poo 

R u \ 0)= / 

JO 


R da(t). 


Since R( 0) = R f ( 0) = ii"(0) = 1, this yields 

poo 

/ (1 — t) 2 da(t) = 0. 

Jo 


Consequently, a(t) must be the Heaviside function (a(£) = 0 for t < 1 and a(t) = 
1 for t > 1). Inserted into (11.38) this gives R(x) = e x . □ 


Exercises 

1. Prove Formula (11.14). For given x, study the set of y -values for which 
\R(x + iy)\ attains its maximum. 

2. Show that the error growth function (11.11) for an A-stable R(z) of order 
p > 1 satisfies 

<p R {x) > e x for all x 7 ^ 0 . 

Hint. You can study the order star on parallel lines {x + zy,yGR} (Hairer, 
Bader & Lubich 1982), or you can use the fact that <p R (x) is superexponential. 

3. (Hairer & Zennaro 1996). Let |iZ(oo)| < 1 and consider a mesh x 0 ,x 1 ,... 
with step sizes h k = x fc+1 — x k satisfying < ch k (c > 1). Prove the 
existence of constants C > 0 and a > 0 such that 

IIj/JI < C{x m -a; 0 ) _ “||y 0 || for m = 1,2,.... 

4. (Kraaijevanger 1986). Let R(z) be a polynomial of degree s satisfying R(z) = 
e z + 0(zP + 1 ). Then the threshold factor g (Definition 11.11) is restricted by 

y < s — p + 1. 

Hint. Justify the formula 

3-p+l 

E “-(i+J) ■ »^ 0 

J=0 V 

and deduce the result from ij(i , “ 1 )( 0 ) = R^( 0 ) = 1 . 

5. Let g be the threshold factor of the rational function R(z). Show that its 
stability domain contains the disc \z + g\ < g. 
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Next we need a generalization of the notion of A -stability. The 
most natural generalization would be to consider the case that x(t) 
is a uniform-asymptotically stable solution ... in the sense of 
Liapunov theory ... but this case seems to be a little too wide. 

(G. Dahlquist 1963) 

The theoretical analysis of the application of numerical methods 
on stiff nonlinear problems is still fairly incomplete. 

(G. Dahlquist 1975) 

Here we enter a new era, the study of stability and convergence for general non¬ 
linear systems. All the “crimes” and diverse omissions of which we have been 
guilty in earlier sections, especially in Sect. IV.2, shall now be repaired. 

Large parts of Dahlquist’s (1963) paper deal with a generalization of A-stability 
to nonlinear problems. His search for a sufficiently general class of nonlinear sys¬ 
tems was finally successful 12 years later. In his talk at the Dundee conference 
of July 1975 he proposed to consider differential equations satisfying a one-sided 
Lipschitz condition, and he presented some first results for multistep methods. 
J.C. Butcher (1975) then extended (on the flight back from the conference) the 
ideas to implicit Runge-Kutta methods and the concept of B -stability was born. 

One-Sided Lipschitz Condition 

We consider the nonlinear differential equation 

y' = fi x ,y) (12.1) 

such that for the Euclidean norm the one-sided Lipschitz condition 

(f{ x ,y)-fi x ,z),y-z) < v \\y-zf ( 12 . 2 ) 

holds. The number v is the one-sided Lipschitz constant of /. This definition is 

motivated by the following result. 

Lemma 12.1. Let f(x,y) be continuous and satisfy (12.2). Then, for any two 
solutions y(x) and z(x) of (12.1) we have 

\\y(x) - z(x)|| < ||y(a: 0 ) - z(x 0 )|| • e v{x - Xo) for x>x 0 . 

Proof. Differentiation of m(x) = || y(x) — z(x)\\ 2 yields 

m'(x) = 2(f(x,y(x))~ f(x,z(x)), y(x) - z(x)) <2vm(x). 

This differential inequality can be solved to give (see Theorem 1.10.3) 
m(x) < m(x 0 )e 2u ^ x ~ Xo ^ for x > x 0 , 
which is equivalent to the statement. □ 
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Remarks, a) In an open convex set, condition (12.2) is equivalent to /i(|^) < v 
(see Sect. 1.10, Exercise 6), if / is continuously differentiable. Lemma 12.1 then 
becomes a special case of Theorem 1.10.6. 

b) For complex-valued y and / condition (12.2) has to be replaced by 

R- e fi x , z ),y - z ) <Hl2/~ 2 ll 2 > y,zec n , (12.2’) 

and Lemma 12.1 remains valid. 


B -Stability and Algebraic Stability 


Whenever v < 0 in (12.2) the distance between any two solutions of (12.1) is a 
non-increasing function of x . The same property is then also desirable for the 
numerical solutions. We consider here implicit Runge-Kutta methods 

s 

Vi =Vo + h ^2 b if( x o + c i h ,9i), 

1=1 
s 

9i = y 0 + h ^2 a ijf( x o + c j h ’9j), 
j =1 

Definition 12.2 (Butcher 1975). A Runge-Kutta method is called B -stable, if the 
contractivity condition 

(f(x,y)-f(x,z),y-z)<0 (12.2”) 


(12.3a) 

i = (12.3b) 


implies for all h > 0 

hi-yj <l|y 0 -yoll- 

Here, y 1 and y x are the numerical approximations after one step starting with 
initial values y Q and y Q , respectively. 

Clearly, B -stability implies A -stability. This is seen by applying the above 
definition to y' = Ay,AeC or, more precisely, to 

Zh(; ?)(:■)• 

Example 12.3. For the collocation methods based on Gaussian quadrature a simple 
proof of B -stability is possible (Wanner 1976). We denote by u(x) and u(x) 
the collocation polynomials (see Definition II.7.6) for the initial values y Q and y Q 
and differentiate the function m(x) — \\u(x) — u(x)\\ 2 . At the collocation points 
£. = Xq + c { h we obtain 

= 2 (/(£«> u (£«))-/(^ 5 w(C«))» u (Q-u(Q) <o. 
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The result then follows from the fact that Gaussian quadrature integrates the poly¬ 
nomial m f (x) (which is of degree 25 — 1) exactly and that the weights 6- are 
positive: 

r%o+h 

||j/i - y,\\ 2 = m(x 0 + h) = m(x 0 ) + / m'(x)dx 

J X 0 
3 

= m(x 0 ) + h^2 b i m '( x o + c i h ) < m { x o) = 11% - %H 2 - 
2=1 

An algebraic criterion for B -stability was found independently by Burrage & 
Butcher (1979) and Crouzeix (1979). The result is 

Theorem 12.4. If the coefficients of a Runge-Kutta method (12.3) satisfy 

i) b i > 0 for i — 1,... ,5, 

ii) M = (m i j) = (b i a i - + b-a^ - 6-6^)? J=1 is non-negative definite, 
then the method is B -stable. 

Definition 12.5. A Runge-Kutta method, satisfying (i) and (ii) of Theorem 12.4, is 
called algebraically stable. 

Proof of Theorem 12.4. We introduce the differences 

A %=%-%- A 2/i=2/i-yi, A 9i=9i-9i , 

A fi = h {fi x o + c i h , 9i)-fi x o + C A?;))> 
and subtract the Runge-Kutta formulas (12.3) for y and y 


s 


Ay, = Ay 0 + y2 b i A fi, 

2=1 

(12.5a) 

3 

A % = A y 0 + y2 a ij A/j- 

i=i 

(12.5b) 


Next we take the square of Formula (12.5a) 

3 S3 

IIA % || 2 = || Ay 0 || 2 + 2 £ b t (Af t , A y 0 ) + £ £ b t b 3 (Af t , A/,). (12.6) 

2=1 2=1 j=l 

The main idea of the proof is now to compute A y 0 from (12.5b) and insert this 
into (12.6). This gives 

II A% || 2 = || Ay 0 1| 2 + 2 £ b l{ A ft ,A 9t ) - ± ± m lj{ A fl , Af } ). (12.7) 

2=1 2=1 j= 1 

The statement now follows from the fact that (A/), Ag t ) < 0 by (12.2”) and that 
J2i,j =i m ij ( A fi , A fj) > 0 (see Exercise 2). □ 
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Example 12.6. For the SDIRK method of Table 7.2 (Chapter II) the weights 6 - are 
seen to be positive and the matrix M becomes 

M = ( 7 - 1 / 4 )-(_} -\y 

For 7 > 1/4 this matrix is non-negative definite and we have B -stability. Exactly 
the same condition was obtained by studying its A-stability (c.f. (3.10)). 


Some Algebraically Stable IRK Methods 

La premiere de ces proprietes consiste en ce que tous les A sont 
positifs. (T.-J. Stieltjes 1884) 


The general study of algebraic stability falls naturally into two steps: the positivity 
of the quadrature weights and the nonnegative-definitness of the matrix M. 


Theorem 12.7. Consider a quadrature formula (c-,6-)^ =1 of order p. 

a) If P >25 — 1 then b i > 0 for all i. 

b) If c- are the zeros of (5.3) (Lobatto quadrature) then h >0 for all i. 


Proof (Stieltjes 1884). The first statement follows from the fact that for p > 25 — 1 
polynomials of degree 25 — 2 are integrated exactly, hence 


6 , - 


/'n(—V 


dx > 0 . 


( 12 . 8 ) 


In the case of the Lobatto quadrature (c-^ = 0, c s = 1 and p — 2s — 2) the factors 
for the indices j — 1 and j — s are taken without squaring and the same argument 
applies. □ 


In order to verify condition (ii) of Theorem 12.4 we find it convenient to use 
the W -transformation of Sect. IV.5 and to consider W T MW instead of M. In 
vector notation (b — ( 6 X ,..., b s ) T , B — diag( 6 j,..., b s ) , A — (a- J -)) we have 

M — BA A- A t B - bb T . (12.9) 

If we choose W according to Lemma 5.12, then W T BW — I and, since W T b = 
e 1 = ( 1 , 0 ,..., 0 ) T , condition (ii) becomes equivalent to 

W T MW — X + X T — e x e{ is non-negative definite (12.10) 

where X = W~*AW = W T BAW as in Theorem 5.11. 

Theorem 12.8. Suppose that a Runge-Kutta method with distinct c { and positive b i 
satisfies the simplifying assumptions B(2s — 2),C(s — l),D(s — 1 ) (see beginning 
of Sect. IV.5). Then the method is algebraically stable if and only if |i?(oo)| < 1 
(where R(z) denotes the stability function). 
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Proof. Since the order of the quadrature formula is p > 2s — 2, the matrix W of 
Lemma 5.12 is 

w = W g D, D = diag(l,..., l,a _1 ), (12.11) 

where W G = (P^jfo))? j=1 is as in (5.21), and a 2 = £- =1 ^PjLife) ^0. Us- 
ing the relation (observe that W T BW — I) 

X = W-'AW = D~ 1 Wq X AW g D = DW g BA{W g B)~ 1 D~ 1 


and applying Lemma 5.7 with 17 = s — 1 and Lemma 5.8 with £ = s — 1, we obtain 

/1/2 -£1 \ 

ti 0 •• 

X= ■ ■ -ts-2 

€s -2 0 

\ a ts-l P / 

If this matrix is inserted into (12.10) then, marvellous surprise, everything cancels 
with the exception of (3. Therefore, condition (ii) of Theorem 12.4 is equivalent to 

( 3 > 0 . 

Using the representation (5.31) of the stability function we obtain by develop¬ 
ing the determinants 


|P(oo)| 


det(X — e 1 ej) 


(3d s _ 1 <y 2 £l_ 1 d s _ 2 

det X 


Pd.-i+«ni-xd t _ 2 


( 12 . 12 ) 


where d k = k \/(2 fc)! is the determinant of the k -dimensional matrix X G of (5.13). 
Since a 2 ( 1 s _ 1 d s _ 2 > 0, the expression (12.12) is bounded by 1 iff (3 > 0. This 
proves the statement. □ 


Comparing these theorems with Table 5.13 yields 

Theorem 12.9. The methods Gauss, Radau IA, Radau IIA and Lobatto IIIC are 
algebraically stable and therefore also B -stable. □ 


AN -Stability 

A -stability theory is based on the autonomous linear equation y' = Xy, whereas 
B -stability is based on general nonlinear systems y' = f{x, y). The question arises 
whether there is a reasonable stability theory between these two extremes. A natural 
approach would be to study the scalar, linear, nonautonomous equation 

y ' = A(x)y, Re X(x) < 0, (12.13) 

where A (a) is an arbitrarily varying complex-valued function (Burrage & Butcher 
1979, Scherer 1979). The somewhat surprising result of this subsection will be that 
stability for (12.13) will, for most RK-methods, be equivalent to B -stability. 
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For the problem (12.13) the Runge-Kutta method (12.3) becomes (in vector 
notation g = (g x ,.. .,g 3 ) T , 1 = ( 1 ,..., 1 ) T ) 

g=ly 0 +AZg, Z = diag^,.. ., z 3 ), z j = h\(x 0 + Cjh). (12.14) 

Computing g from (12.14) and inserting into (12.3a) gives 

yi =K{Z)y 0 , K(Z) = l + b T Z{I-AZ)~ 1 l. (12.15) 


Definition 12.10. A Runge-Kutta method is called AN -stable , if 


\K(Z)\<1 


for all Z = diag^,..., z s ) satisfying Re z- < 0 
and Zj = z k whenever c ■ = c k (j, k — 1 ,..., s). 


Comparing (12.15) with (3.2) we find that 

7\(diag(z,z,...,z)) = R{z), (12.16) 

the usual stability function. Further, arguing as with (12.4), B -stability implies 
AN -stability. Therefore we have: 

Theorem 12.11. For Runge-Kutta methods it holds 

B-stable => AN-stable A-stable. □ 


For the trapezoidal rule y 1 = y Q + | (/(x 0 , y 0 ) + f(x x , y 1 )) the function K(Z) 
of (12.15) is given by 

A ’< z > = <1217 » 

Putting z 2 = 0 and z 1 — oo we see that this method is not AN -stable. More 
generally we have the following result. 

Theorem 12.12 (Scherer 1979). The Lobatto IIIA and Lobatto IIIB methods are 
not AN -stable and therefore not B -stable. 


Proof As in Proposition 3.2 we find that 


K{Z) = 


det(/ — {A — lb T )Z) 
det (I-AZ) 


(12.18) 


By definition, the first row of A and the last row of A — 11 b T vanish for the Lo¬ 
batto IIIA methods (compare also the proof of Theorem 5.5). Therefore the de¬ 
nominator of I\ (Z) does not depend on z 1 and the numerator not on . If we put 
for example z 2 = ... = z s = 0, the function K(Z) is unbounded for z 1 —y — oo. 
This contradicts AN -stability. 

For the Lobatto IIIB methods, one uses in a similar way that the last column of 
A and the first column of A — 116 T vanish. □ 
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The following result shows, as mentioned above, that AN -stability is much 
closer to B -stability than to A-stability. 


Theorem 12.13 (Burrage & Butcher 1979). Suppose that 

\K(Z)\ < 1 ff orallz = dia g(^i,- • • with Re zj < 0 
* ' \and\z-\ < e for some e > 0, 

then the method is algebraically stable (and hence also B -stable). 


(12.19) 


Proof. For A/- := z i Ng i and A y 0 = 1 the result of (12.5) is A y 1 = K(Z). Tak¬ 
ing care of the fact that z i need not be real, the computation of the proof of Theo¬ 
rem 12.4 shows that 

s s 

\K(Z)\ 2 - 1 = 2y2 b i Rez i\9i\ 2 - 'Z m ij E i9i z j9j■ ( 12 . 20 ) 

i=l hj =1 

Here g = (g 1 ,..., g s ) T is a solution of (12.14) with y Q = 1. 

To prove that 6 • > 0, choose z i = —e < 0 and z- =0 for j ^ i . Assumption 
(12.19) together with (12.20) implies 

< 0 . ( 12 . 21 ) 

For sufficiently small e, g i is close to 1 and the second term in (12.21) is negligible 
for b i 0. Therefore, b i must be non-negative. 

To verify the second condition of algebraic stability we choose the purely imag¬ 
inary numbers z- = ie^- (^ G M). Since again g • = 1 + O(e) for e —>• 0, we have 
from (12.20) that 

-e 2 m iAtj+°( £3 ) ^°- 

*,i=i 

Therefore, M = (m-) has to be non-negative definite. □ 


Combining this result with those of Theorems 12.4 and 12.11 we obtain 

Corollary 12.14. For non-confluent Runge-Kutta methods (i.e., methods with all 
c ■ distinct) the concepts of AN - stability ; B -stability and algebraic stability are 
equivalent. □ 


An equivalence result (between B - and algebraic stability) for confluent Runge- 
Kutta methods is much more difficult to prove (see Theorem 12.18 below) and will 
be our next goal. To this end we first have to discuss reducible methods. 
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Reducible Runge-Kutta Methods 

For an RK-method (12.3) it may happen that for all differential equations (12.1) 

i) some stages don’t influence the numerical solution; 

ii) several g { are identical. 

In both situations the Runge-Kutta method can be simplified to an “equivalent” one 
with fewer stages. 

For an illustration of situation (i) consider the method of Table 12.1. Its nu¬ 
merical solution is independent of g 2 and equivalent to the implicit Euler solution. 
For the method of Table 12.2 one easily verifies that g 1 =g 2 > whenever the system 
(12.3b) possesses a unique solution. The method is thus equivalent to the implicit 
mid-point rule. 

Table 12.1. Table 12.2. 

DJ -reducible method S -reducible method 


1 

1 

0 

1/2 

1/2 

0 

1/2 

1/4 

1/4 

1/2 

1/4 

1/4 


1 

0 


1/2 

1/2 


The situation (i) above can be made more precise as follows: 

Definition 12.15 (Dahlquist & Jeltsch 1979). A Runge-Kutta method is called 
DJ-reducible , if for some non-empty index set T C {1,..., s }, 

b-~ 0 for jGT and a- — 0 for i ^ T, j G T. (12.22) 

Otherwise it is called DJ-irreducible. 

Condition (12.22) implies that the stages j E T don’t influence the numerical 
solution. This is best seen by permuting the stages so that the elements of T are the 
last ones (Cooper 1985). Then the Runge-Kutta tableau becomes that of Table 12.3. 

Table 12.3. DJ -reducibility 


ci 

An 

0 

C2 

A21 

A 22 


rT 

b l 

0 


ci 

A n 


b l 


An interesting property of DJ-irreducible and algebraically stable Runge- 
Kutta methods was discovered by Dahlquist & Jeltsch (1979). 

Theorem 12.16. A DJ -irreducible , algebraically stable Runge-Kutta method sa¬ 
tisfies 


b • > 0 for i — 1,..., 5. 
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Proof. Suppose 6 ■ = 0 for some index j. Then m-- = 0 by definition of M. Since 
M is non-negative definite, all elements in the j th column of M must vanish 
(Exercise 11) so that 6-a- = 0 for all i. This implies (12.22) for the set T = 
{j \bj : = 0}, a contradiction to D J -irreducibility. □ 


An algebraic criterion for the situation (ii) was given for the first time (but in¬ 
completely) by Stetter (1973, p. 127) and finally by Hundsdorfer & Spijker (1981), 
see also Butcher (1987), p. 319, and Dekker & Verwer (1984), p. 108. 

Definition 12.17. A Runge-Kutta method is S-reducible, if for some partition 
(S' 1 ,...,S r ) of with r < 5 we have for all l and m 

^2 a ik='%2 a jk if (12.23) 

k £ S m k £ S m 

Otherwise it is called S-irreducible. Methods which are neither DJ -reducible nor 
S -reducible are called irreducible. 

In order to understand condition (12.23) we assume that, after a certain permu¬ 
tation of the stages, l £ £) for l = 1 ,..., r. We then consider the r-stage method 
with coefficients 

c*i=c { , 4 =$><*, (12.24) 

keSj keSj 

Application of this new method to (12.1) yields ,..., < 7 *, yl and one easily ver¬ 
ifies that g i and y 1 defined by 

9i=9*t if * e Vi=yl, 

are a solution of the original method (12.3). For the method of Table 12.2 we have 
S t ={ 1 , 2 }. A further example of an S -reducible method is given in Table 12.4 
of Sect. II. 12 (Si = {1,2,3} and S 2 = {4}). 

The Equivalence Theorem for S -Irreducible Methods 

Theorem 12.18 (Hundsdorfer & Spijker 1981). For S-irreducible Runge-Kutta 
methods , 

B -stable <=> algebraically stable. 

Proof. Because of Corollary 12.14, which covers nearly all cases of practical im¬ 
portance — and which was much easier to prove — this theorem seems to be of 
little practical interest. However, it is an interesting result which had been con¬ 
jectured by many people for many years, so we reproduce its proof, which also 
includes the three Lemmas 12.19-12.21. The counter example of Exercise 6 below 
shows that S -irreducibility is a necessary hypethesis. 
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By Theorem 12.4 it is sufficient to prove that jB -stability and S -irreducibility 
imply algebraic stability. For this we take 5 complex numbers z l7 ... ,z 3 which 
satisfy R ez- < 0 and \z-\ < e for some sufficiently small e > 0. We show that 
there exists a continuous function / : C —> C satisfying 

Re (/(it) — /(f), u — v) < 0 for all it, t; £ C , (12.25) 

such that the Runge-Kutta solutions y 1 , g • and y 1 , g i corresponding to y 0 = 0, 
y Q = 1, h = 1 satisfy 

f{9 i )-f{9i)=z i {g i -g i ). (12.26) 


This yields y 1 —y 1 =K(Z) with K(Z) given by (12.15). B -stability then implies 
| K(Z) \ < 1. By continuity of K(Z) near the origin we then have \K(Z)\<1 for 
all z- which satisfy R ez - < 0 and \z-\ < e, so that Theorem 12.13 proves the 
statement. 

Construction of the function f: we denote by A g { the solution of 

3 

A 9 t = 1 + £ "‘j'AUj 

3 = 1 


(the solution exists uniquely if e and e is sufficiently small). With £, g given 
by Lemma 12.19 (below) we define 


9i=tVi, f(9i) = t Zi 

9i=9i+ A 9i, f{9i) = f(9i) + z Agi- 


(12.27) 


This is well-defined for sufficiently large t (to be fixed later), because the 77 • are 
distinct. Clearly, g • and g i represent a Runge-Kutta solution for y 0 = 0 and y 0 — 1, 
and (12.26) is satisfied by definition. 

We next show that 


Re (/(it) —/(v), it —v)<0 if (12.28) 

is satisfied for u , v £ D = {g 1 ,..., g s , g ± ,..., g s }. This follows from the construc¬ 
tion of £, 77 , if u, v £ {g 1 ,..., g s }. If u = g • and v = <7 • this is a consequence of 
(12.26). For the remaining case u = g-, v £ D \ we have 

(f(u)-f{v),u-v) =t 2 (Z i -Z j ){ri i -ri :j ) + 0 (t) for t ->• oo, 

so that (12.28) is satisfied, if t is sufficiently large. Applying Lemma 12.20 below 
we find a continuous function /:€-*€ that extends (12.27) and satisfies (12.25). 

□ 


To complete the above proof we still need the following three lemmas: 

Lemma 12.19. Let A be the coefficient matrix of an S -irreducible Runge-Kutta 
method. Then there exist vectors £ £ R 3 and 77 = A£ such that 

{t,i-Zj){Vi~Vj)< 0 for i^j. 


(12.29) 
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Proof (see Butcher 1982). The first idea is to put 

£ = 1 —eAl with 1 = ( 1 , 1 ,..., 1 ) T , 

so that rj becomes 

rj = A( > = All- e A 2 11. 

If c• 7 ^ c ■ for all z, j , then 7 ^ 0 and for e sufficiently small we have rj i — rjj 

of opposite sign, thus (12.29) is true. 

For a proof of the remaining cases, we shall construct recursively vectors 
v 0 ,v 1 , v 2 ,... and denote by P k the partition of { 1 ,..., s } defined by the equiva¬ 
lence relation 

i~j ( V q)i = ( V q)j for q = 0,l,...,k. (12.30) 

For a given partition P of {1,2,..., s} we introduce the space 

X(P) = {v E R s ] (v) • = (v)j if i~j with respect to P}. 

With this notation, the method is S -irreducible if and only if 

AX{P) <£X(P) (12.31) 

for every partition other than {{ 1 }, { 2 },..., {s}}. 

We start with v 0 — 11 and P 0 — {{ 1 ,..., s}} and define 

= [Av k if Av k ?X(P k ) 

Vk+1 U if Av k eX(P k ) 

where to is an arbitrary vector of X(P k ) satisfying Auj £ X(P k ). Such a choice 
is possible by (12.31). After a finite number of steps, say m, we arrive at P m = 
{{1}, {2},..., {s }}, because the number of components of P k is increasing, and 
strictly increasing after every second step. Therefore all elements of the vector 

£ = v 0 - ev^ + e 2 v 2 -... + (- e) m v m 

are distinct (for sufficiently small e > 0) and (12.29) is satisfied. □ 


Lemma 12.20 (Minty 1962). Let u 1 ,..., u k and f(u 1 ),..., f(u k ) be elements of 
R n with 

(f{v-i)- f(Uj),Ui-Uj) < 0 for ifj. 

Then there exists a continuous extension f : R n —> R n satisfying 
(f{u) — f(v),u — v)< 0 for all u,v £ E n . 


Proof (Wakker 1985). Define 


7 = max 
i^j 


(/(«,) - f( U j), U i~Uj) 


<0 
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and put g(u { ) = 27/^) - it,- , so that ||fli(u i ) - g(uj) || < Ht^ - Uj\\ . An applica- 
tion of Lemma 12.21 shows that there exists a continuous extension g : R n —y R n 
satisfying || g(u) — g(v )|| < ||u — v\\ (i.e., g is non-expansive). The function 

f( u ) = g(u) + u) 

then satisfies the requirements. □ 


Lemma 12.21 (Kirszbraun 1934). Let u x , ... ,u k and < 7 ( 1 ^),... ,g(u k ) € R n be 
such that 

\\g( u i) -5( M j)ll < IK - M ill for i,j = l,...,fc. (12.32) 

Then there exists a continuous extension g : R n —>■ R n such that 

\\g(u) — g(u)|| < ||it — v|| for all (12.33) 

Proof This was once a difficult result in set-theory. A particularly nice proof, of 
which we give here a “dynamic” version, has been found by I.J. Schoenberg (1953). 



Fig. 12.1. Construction of g(p) 


a) The main problem is to construct for one given point p the extension g(p) 
such that (12.33) remains satisfied. We move the points u i into their images g{uf) 
by an affine map 

u i(^) = u i + Hg( u i)~ u i)i 0 < A < 1, z = l,...,fc. (12.34) 

We define r- = \\u i — p\\ and shrink, for each A, the balls with center uf A) and 
radius r^/i until their intersection consists of one point only 

k 

n(\) := minjp ; ||u t (A) - u\\ < r^} ^ 0}. 

i —1 


(12.35) 
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This intersection point, denoted by p( A) (see Fig. 12.1), depends continuously (ex¬ 
cept for a possible sudden decrease of fi if A — 0) and piecewise differentiably on 
A. We shall show that g(X) is non-increasing, which means that g(p) := p( 1 ) is a 
point satisfying (12.33). 

We denote the vectors 


Ri:= Ui {\)-p{\), (12.36) 

and have from the hypothesis (12.32) that \\R { — it ^|| 2 is non-increasing, hence 
that (i?- — Rj,dR { — dR-) < 0 or 

(R^dR^ + iR^dRJ > (R^dRi) + {R-,dR-). (12.37) 

As can be seen in Fig. 12.1, not all points u-(A) are always “active” in (12.35), 
i.e., p( A) lies on the boundary of the shrinked ball centered in u ■( A). While for 
A = 0 (for which ||jR-|| = r i g) all four are active, at A = 1/2 the active points 
are w 1 (A), u 2 (A), u 3 (A), and finally for A = 1 we only have w 1 (A) and u 2 (A) 
active. We suppose, for a given A, that w 1 (A),..., u m ( A) (m < k) are the active 
points, which may sometimes require a proper renumbering. The crucial idea of 
Schoenberg is the fact that p{ A) lies in the convex hull of w 1 (A),..., u m (A), i.e., 
there are positive values c x (A),..., c m (A) with YllLi c i R i = 0. This means that 

(E i c i Ri,E j c j dR j ) = o. 

We here apply (12.37) pairwise to i,j and j, i , and obtain 

0 = <Ei CiRi, Ej Cj dRj) > Zi(Ri, dR t ){c t Ej c,). 

Since by construction (see (12.36)) all ||i?-|| decrease or increase simultaneously 
with l i , and since all c- > 0 , we see that dfi < 0 , i.e., (x is non-increasing. 

b) The rest is now standard (Kirszbraun): we choose a countable dense se¬ 
quence of points p 1 ? p 2 ?P 3 ? • • • i n extend g gradually to these points, so 

that (12.33) is always satisfied. By continuity (see (12.33)), our function is then de¬ 
fined everywhere. This completes the proof of Lemma 12.21 and with it the proof 
of Theorem 12.18. □ 

Nous ne connaissons pas d’exemples de methodes qui soient B- 
stables au sens de Butcher et qui ne soient pas B -stables suivant 
notre definition. (M. Crouzeix 1979) 

Remark. Burrage & Butcher (1979) distinguish between BN -stability (based on 
non-autonomous systems) and B -stability (based on autonomous systems). Since 
the differential equation constructed in the above proof (see (12.25)) is autonomous , 
both concepts are equivalent for irreducible methods. 
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Error Growth Function 

All the above theory deals only with contractivity when the one-sided Lipschitz 
constant v in (12.2) is zero (see Definition 12.2). The question arises whether we 
can sharpen the estimate when it is known that v < 0, and whether we can obtain 
estimates also in the case when (12.2) holds only for some v > 0. 

Definition 12.22 (Burrage & Butcher 1979). Let v be given and set x — hv, where 
h is the step size. We then denote by <p B (x) the smallest number for which the 
estimate 

hi -£ill <<P B ( X ) II 2 / 0 - 2 / 0 II (12.38) 

holds for all problems satisfying 

Re(f(x,y)~f{x,z), y-z)<v \\y-z\\ 2 . (12.39) 

We call <p B (x) the error growth function of the method. 

We consider here complex-valued functions / : R x C 4 C n . This is not 
more general (any such system can be written in real form by considering real 
and imaginary parts, see Eq. (12.4)), but it is more convenient when working with 
problems y f = A (x)y, where A(x) is complex-valued. 

In the case of a linear nonautonomous problem y f = A(x)y , condition (12.39) 
becomes p[A(x)) < v (where /i(-) denotes the logarithmic norm; see Sect. 1.10). 
Putting Z- := hA(x 0 + c { h), the difference of two numerical solutions becomes 

Vi ~~ Vi = & (^i i i Z s )(y Q — 2 / 0)5 

where 

I<{Z 1 ,..., Z s ) = I + {b T ® I)Z{I <g> I - (A 0 1)Z) -1 (1 ® I), (12.40) 

and Z is the block diagonal matrix with Z 1 ,..., Z s as entries in the diagonal. 

Theorem 12.23. The error growth function of an implicit Runge-Kutta method 
satisfies 

( Pb( x )= su P \\ K ( z i’---,Z s )\\. (12.41) 

^{Zi )<x,...,{i(Z s )<x 

Proof Upper Bound. The difference =y 1 —y 1 of two Runge-Kutta solutions 
satisfies (12.5). The assumption (12.39) implies that Re (A/-, Ag { ) < x\\A gi \\ 2 . 
We shall prove that there exist matrices Z i (z = 1,..., 5 ) with p(Z { ) <x such that 
A/- = Z i Ag i . This implies A y 1 = K(Z 1 ,..., Z s )Ay 0 and, as a consequence, that 
the right-hand expression of Eq. (12.41) is an upper bound of p B (x ). 

If Ag { = 0 then A/- = 0 and we can take an arbitrary matrix satisfying p(Z { )< 
x . Therefore, let us consider vectors /, g (with g^f 0) in C n satisfying Re (/, g) < 
x||< 7 1 | 2 . We put u 1 := ^/||^||, and complete it to an orthonormal basis tt 1? ..., u n 
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of C n . Then we define the matrix Z by 

Zu 1 := f/\\g\\, Zu i :=xu i -(u i ,f)u 1 /\\g\\, i = 2,...,n. 

We have Zg = /, and one readily verifies that R e{Zv,v) < x\\v \\ 2 for all v = 

E n 

2=1 ®i U i * 

Lower Bound. We first consider nonconfluent Runge-Kutta methods. For given 
Z x ,..., Z s with p{Zf) < x let A(x) be a continuous function satisfying hA(x 0 -f 
c z -/i) = Z { and p[A{x)) < x for all x (A(x) is, for example, obtained by linear 
interpolation). Then we have A y 1 = K(Z X ,..., Z s )Ay 0 and, consequently, also 

<P B ( X ) ^ \\ k ( z it- -: Z s)\\ fora11 with K z i)< x - 

For confluent methods the proof is more complicated. Without loss of general¬ 
ity we can assume that the method is irreducible, because neither the value p B {x) 
nor the right-hand expression of Eq. (12.41) change, when the method is replaced 
by an equivalent one. The main observation is now that the Lemmata 12.20 and 
12.21 are valid in arbitrary dimensions. Consider Z 1 ,...,Z S with g(Zf) < x , such 
that the linear system A g i = A y 0 + J2*j =i a ij^j^9j ^ as a solution. Exactly as in 
the proof of Theorem 12.18 we can construct a continuous function / : C n —> C n , 
which satisfies (12.39) with v — x (we put h — 1) and f{g { ) — /(^-) = Z i (g i — g { ). 
This completes the proof of the theorem. □ 


For 1 -stage methods (s = 1) the Theorem of von Neumann (Corollary 11.4) 
implies that it is sufficient to consider scalar, complex-valued z x in Eq. (12.41). 
Since K(z) — R(z) in this case, we have 

<p B (x) = ¥r{x) for all 1-stage methods. (12.42) 

For the moment it is not clear, whether one can restrict the supremum in Eq. (12.41) 
to scalar, complex-valued z i also for <s > 2. This would require a generalization 
of the Theorem of von Neumann to functions of more than one variables (Hairer & 
Wanner 1996). We shall come back to this question later in this section. 

Theorem 12.24 (Hairer & Zennaro 1996). For B -stable Runge-Kutta methods the 
error growth function is superexponential, i.e., ip B ( 0 ) = 1 and 

<Pb( x i) ^ 5 (^ 2 ) — + ^ 2 ) for x 1,^2 having the same sign. 

Proof. The property p B (0) = 1 follows from Definition 12.5. For the proof of the 
inequality we consider the rational function 

S{z) = u* a K(A 1 — zi,... ,A S — zI)v a u* b K{B 1 +zI,...,B s + zI)v b , 

where the matrices A- , Bj satisfy p{A- )<x 1 +x 2 and p{B-) < 0, and u A , v A , 
u B , v B are arbitrary vectors of C n . Using the property p{A- — zI) — A-) — 

Rez and the fact that ||C|| = sup || u || =1 \u*Cv\, the inequality is obtained 

exactly as in the proof of Theorem 11.6. □ 
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The fact that ip B (x) is superexponential together with p B (— oo) = |ii(oo)| 
(see Exercise 8 ) allows us to draw the same conclusions on asymptotic stability of 
numerical solutions as in Sect. IV. 11. 


Computation of <pg(x) 


The idea is to search for the maximum of HAyJI under the restriction (12.39). 
More precisely, we consider the following inequality constrained optimization prob¬ 
lem: 


IIAt/if —>max, 

Re (A/-, Ag t ) < zllAc^ll 2 , i = 1,.. . , 5 . 


(12.43) 


Here A/ 1? ..., A f 3 are regarded as independent variables in C n , Ay 1 and A g i 
are defined by (12.5), and A y 0 is considered as a parameter. A classical approach 
for solving the optimization problem (12.43) is to introduce Lagrange multipliers 
d 1 ,..., d s , and to consider the Lagrangian 


C{Af,D) = IWAy.f-td, (Re(A/ ; ,A^)-a:||A^|| 2 ) 

i =1 

= -i(A„J,*/*)((“ U w)®')(%°), 

where Af = (Af ly ... ,Af s ) T , D = diag (d 1 ,... ,d s ), and 

a = —1 — 2 xl T Dl, 
u = Dl-b-2xA T Dt, 

W = DA + A t D - bb T - 2 xA t DA. 


(12.44) 


(12.45a) 

(12.45b) 

(12.45c) 


Theorem 12.25 (Burrage & Butcher 1980). If the matrix 

is positive semi-definite (12.46) 

for some d 1 > 0 ,..., d 3 > 0 , then it holds HAy-JI < </?||Ay 0 || for all problems 
satisfying (12.39) with hv < x. Consequently, we have p B (x) < ip. 

Proof Substracting c/? 2 ||A?/ 0 || 2 /2 from both sides of (12.44) yields 

^(ll A 2/il | 2 -V> 2 ||Ay 0 || 2 ) — ^ d { (Re (A/ ; , Ag { ) — a;||A<? i || 2 ) <0. 

i= 1 

The statement then follows from d i > 0 and Re(A/ i ,Afif i ) < ac|| A^H 2 . □ 


(a + ip 2 u T \ 

( u W) 
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With the help of this theorem, Burrage & Butcher (1980) computed an upper 
bound of <p B (x ) f° r many 2-stage methods. It turned out that for all these 2-stage 
methods p B (x) = <Pk(x) , where 

<Pk( x )= su P \K{ z i,---, z s)\- (12.47) 

Re z\ <tr,...,Re z s <.x 

There naturally arises the question: is it true that p B (x) — c Pk( x ) f or a U Runge- 
Kutta methods? If we want to check the validity of ip B (x) = ip K (x) for a given 
Runge-Kutta method, we have to find non-negative Lagrange multipliers d •, such 
that (12.46) is satisfied. The following lemmas will be useful for this purpose. 

We denote by z ®,..., z° s the values, for which the supremum in Eq. (12.47) 
is attained. By the maximum principle we have z® = x + iy® (y°- = oo is ad¬ 
mitted). We further put z° = ( 2 °,..., z° s ) and let d-K(z°) be the derivative of 
K(z 1 ,..., z 3 ) with respect to the j th argument, evaluated at z°. 

Lemma 12.26. Let x be fixed with p K (x) < oo. The condition (12.46) with 
(p = p K (x) then uniquely determines the Lagrange multipliers d x , ..., d s (see 
Eq. (12.53) below). They are real and positive. 


Proof. Consider the identity (12.44) for the special case, where A f- is scalar, 
A/- = ZjAgj , and hence A y 1 = I\ (z 1 ,..., z s ). For Re z- = x this identity be¬ 
comes 


\I<(z 


V = -(1,A/*) 


a + p 2 
u 


W 


1 

A/ 


(12.48) 


Putting p '•= p K (x) and z- := z® (eventually one has to consider limits) the left- 
hand expression of Eq. (12.48) vanishes. This together with assumption (12.46) 
implies that u + W A/ = 0, i.e., 

Dl — b — 2 xA t D1 + (DA + A t D - bb T - 2 xA t DA) Af = 0. 

Collecting suitable terms, and using A / = Z 0 Ag and A g — 11 + A Af , where Z Q = 
diag ( 2 °,..., z° s ) , this relation becomes 

DAg = (I-A T Z*)~ 1 b-K(z 0 ). (12.49) 


We shall show that all components of A g = (J — AZf)~ x 11 are different from zero, 
so that (12.49) uniquely determines the Lagrange multipliers d 1? ..., d s . 

Expanding K(z 1 ,..., z s ) into a Taylor series with respect to z -, we obtain 




i ,...,z° s ) = K(z 0 )(l + c(z J -z° J ) + O((z J 



where c = djK (z°)/K(z 0 ). Since \K(z°,...,z j ,...,z%)\<\K(z°)\ for Re 
Re z °-, we have c > 0, and consequently also 

d j K(z°) ± 0 , 0 < d j K(z°)/K(z°) < oo. 


(12.50) 
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Differentiating K(z 1 ,..., z 3 ) = 1 + b T Z(I — AZ)~ X 11 with respect to z- yields 

d J K(z°)=b T (I-Z 0 Ar 1 e j eJ(I-AZ 0 )- 1 l, (12.51) 

and we obtain from (12.50) that 

b T (I-Z 0 A)- 1 e j ^O, Ag j = eJ(I-AZ o )~ 1 1 ^ 0 , (12.52) 

so that d x ,..., d s are uniquely determined by (12.49). Dividing the j th component 
of (12.49) by A g -, it follows from (12.51) that 

d, = \bT(I-Z 0 A rV'a^Py, (12.53) 

which is a strictly positive real number by (12.50) and (12.52). 

In this proof we have implicitly assumed that all z® are finite. If z® = x + ioo 
for some j , one has to apply the standard transformation uj- — x + l/(z — x) , 
which maps the half-plane Re z- < x onto Re uj-<x, and oo into 0. □ 


Lemma 12.27. If the matrix W ofEq. (12.45c), with ..., d 3 given by Lemma 
12.26, is positive semi-definite, then we have (f B (x) —(p K [x). 

Proof. It follows from 


(“ + */ W £)( a /)= 0 <*»> 

(see Eq. (12.48)) and from v T Wv > 0 for all that the matrix in (12.54) is 

positive semi-definite. The statement then follows from Theorem 12.25. □ 


With the above results it is possible to check for a given Runge-Kutta method, 
whether (p B { x ) = ¥k( x ) satisfied. This can be done by the following algorithm: 

• compute (p = p K (x) of Eq. (12.47) either numerically or with help of a formula 
manipulation program; 

• compute the Lagrange multipliers d x ,..., d 3 from Lemma 12.26; 

• check, whether the matrix W of Eq. (12.45c) is positive semi-definite. If this 
is the case, it holds <p B (x) = <p K (x) by Lemma 12.27. 

Example 12.28. For the two-stage Radau IIA method (see Table 5.5) the function 
K(z 1 ^z 2 ) is given by 


1 + z 1 /3 

1 — 5^/12 — z 2 /4 + z 1 z 2 /§ 


K(z 1 ,z 2 ) — 
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The maximum of | A l z ,, 

X + zoo 


\i^ 2 )\ on the set Re z t < x is attained at 


for x < £ 


z° = 


. 45 — 42a; + 8a; 2 £ ^ 

I + “' ; 9 + 18 »- 8 x » fOT ^ I<3 / 2 


for x < £ 


Zo = 


. xJ(4b - 42x + 8x 2 ) (9 + 18z - 8x 2 ) 

I + 1 —- 8 1 * - fa -9 - 1 


(the value £ = (9 — 3\/l7)/8 is a root of 9 + 18z — 8x 2 = 0) and it is given by 


¥>k( x ) 


4 

5~2x 

3 + 4x 

, v /(3-2x)(3 + 4x-2x 2 ) 


if 

if 


x < £ 


£ < x < 3/2. 


The function K{z^z 2 ) is not bounded on Re 2 • < x for z > 3/2. From the proof 
of Lemma 12.26 we compute d x and d 2 , and obtain 


d i 


9 

(3 — x)(5 — 2x) 
(3 + 4x) 2 
4(3 + 4x — 2x 2 ) 


for x < £ 
for £ < x 


d 2 


( 2 


b-2x 

3 + 4x 


\ 4(3 + 4x — 2x 2 ) 


for x < £ 
for £ < x. 


With these values one checks straight-forwardly that the matrix W of Eq. (12.45c) 
is semi-definite positive, so that y> B (x) = ip K (x)\ see also Burrage & Butcher 
(1980). Actually, the matrix W is non-singular for x < £, and of rank one for 
\<x< 3/2. 

A comparison with Eq. (11.15) shows that we do not obtain the same estimate 
as for linear autonomous problems. 


The above algorithm can easily be applied to other two-stage methods. We 
thus obtain for the two-stage Gauss method 


f 1 

Vb( x ) = \ 2x + V9 + 3x 2 
K 3 — x 

and for the two-stage Lobatto IIIC method 


if — oo < x < 0 

if 0 < x < 3 , 


<Pb( x ) 


1 

1 — X + X 2 
1 

< 1 — X 


if — oo < x < 0 
if 0 < s < 1 . 


For methods with more than two stages, explicit formulas are difficult to obtain, 
and one has to apply numerical methods for the computation of z® (supremum in 
Eq. (12.47)). 
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Exercises 

1. Prove, directly from Def. 12.2, that the implicit Euler method is B -stable. 

2. Let M be a symmetric 5 x 5-matrix and (•, •) the scalar product of R n . Then 
M is non-negative definite, if and only if 

s s 

L H > u j) ^ 0 fora11 u > e R ” 

*=i j —i 

Hint. Use M = Q T DQ where D is diagonal. 

3. Give a simple proof for the B -stability of the Radau IIA methods by extending 
the ideas of Example 12.3. 

Hint. For the quadrature, based on the zeros of (5.2), we have 

1 v(*)dx = J2 b ^( c >) + 0 < £ < 1 . 

i =1 

with C < 0 (see e.g. Abramowitz & Stegun (1964, Formula 25.4.31)). 

4. (Dahlquist & Jeltsch 1987). Prove that Method I of Table 12.4 is 5-reducible 
with respect to the partition ({1},{2,3}). The reduced method II itself is DJ - 
reducible and reduces to Method III. 

For the initial value problem y f = f(y ), y(0) = 1, where f(y) —y 1 for y > 0 
and f(y ) = 0 for y < 0, and for h = 2, Methods I and III have unique solutions 
which are different. Explain this apparent contradiction. 

Table 12.4. Reduction of RK-methods 

0 0 0 

1/2 0 1/2 

1 0 

Method I Method II Method III 

5. Give a counterexample of an irreducible AN -stable but not algebraically sta¬ 
ble, and hence not B -stable method. 

Hint. Start with any algebraically stable method with, say, two stages and 
modify it as indicated in Table 12.5. Find conditions on the free parameters 
d, e, a such that the two methods are identical for equations y f — \{x)y. This 
ensures AN -stability of the second method. Then play with the parameters to 
destroy algebraic stability. 

6. Show that the method of Table 12.1 is DJ-reducible, but not 5-reducible; 
show that it is algebraically stable together with the reduced method. 
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Table 12.5. Construction of AN -stable but not B -stable method 






Cl 

an 

a 12 a 

012(1 -a) 

C 1 

a ll 

a \2 


C2 

C2~d 

da 

d( 1 - a) 

C 2 

°21 

a 22 


C2 

c 2 -e 

ea 

e(l - a) 


h 

^2 



bi 

b 2 a 

6 2 (1-a) 


Show that the method of Table 12.2 is S-reducible, but not DJ -reducible; 
show that it is not algebraically stable, but that the reduced method is. 


7. (Sandberg & Shichman 1968, Vanselow 1979, Hundsdorfer 1985). Prove that 
Rosenbrock methods are not B -stable in the sense of Definition 11.2. 


Hint. Apply the method to the scalar problem y f = f(y), y 0 = 1 where f(y) 
is a non-increasing function satisfying (for a small e ) 


f(y) = 


-y 

-l 


if |y —1| > 2g 
if |y-l|<e. 


8 . (Hairer & Zennaro 1996). For irreducible, algebraically stable Runge-Kutta 
methods the error growth function satisfies 


<Pb( x ) < 


^l- 2 a: 7 (l - g 2 ) -2 x~iq 
1 — 2x7 


for x < 0 , 


where g = |iJ(oo)| (R(z) is the stability function), 7 = (j2 s j=1 b- 
and K, • • •, v a ) T = lim £ ^ 0 b T {A + £l)~ 1 . 

Hint. From(12.7) we have HAy^l 2 < ||Ay 0 || 2 + 2x ^ ■ 6 -||Ag-|| 2 . Then, com¬ 
pute A/- from (12.5b) (if A is invertible), insert it into (12.5a) and conclude 
Ay x = f?(oo)Ay 0 + h z u ij)^9r where K?) = A ~' ■ The Cauch y- 

Schwarz inequality yields Ei Ml Ay ;|| 2 > 7 (||Ay 1 || — £>||Ay 0 ||) 2 which, in- 
serted into the first estimate, gives a second degree inequality for A y x . 


9. Prove that for the 3-stage Gauss method we have for x > 0 
Vb( x ) > (1 + x/2 )/(1 — x/2). 

Hint. Using (12.18), compute K(Z) for 27 — 00 , z 2 — x, z 3 ^ — 00 . 


10. If the matrix W of Eq. (12.45c), with d 1? ..., d s given by Lemma 12.26, is 

either non-singular or of rank < 1 , then it holds —^Pk{ x )- 

Hint. Exploit the fact that the expression in Eq. (12.48) with = <Pk(x) is 
non-positive for all z- with Re z j^ x - 

11. Show that for a non-negative definite symmetric matrix M = [m i -) one has 

Kjl ^ y/ m ii m 3j■ 



IV.13 Positive Quadrature Formulas 
and B-Stable RK-Methods 


Bien que le probleme (des quadratures) ait une duree de deux 
cents ans a peu pres, bien qu’il etait l’objet de nombreuses recher- 
ches de plusieurs geometres: Newton, Cotes, Gauss, Jacobi, Her- 
mite, Tchebychef, Christoffel, Heine, Radeau fy/c], A. Markov, 
T. Stitjes [sic], C. Posse, C. Andreev, N. Sonin et d’autres, il ne 
peut etre considere, cependant, comme suffisamment epuise. 

(V. Steklov 1917) 

We shall give a constructive characterization of all irreducible T?-stable Runge- 
Kutta methods (Theorem 13.15). Because of Theorem 12.16 we first have to study 
quadrature formulas with positive weights. 


Quadrature Formulas and Related Continued Fractions 


Steklov (1916) proved that a family of interpolatory quadrature formulas converges 
for all Riemann integrable functions, if all weights of the formulas are positive (“II 
faut remarquer cependant que de tels theoremes generaux ne peuvent avoir aucune 
valeur pratique ...”). This theorem, rediscovered around 1922 by Fejer, initiated 
an extensive search for quadrature formulas with positive weights. Fejer (1933, 
“weiter habe ich noch auf sehr kurzem Wege das folgende Resultat erhalten ... ”) 
found the result: 


“If P s (z) are the Legendre polynomials normalized as in (13.4) and c x ,..., c s 
are the zeros of M(z) — P 3 (z) + a 1 P s _ 1 (z) + a 2 P s _ 2 (z) with a 2 < 0, then the 
weights 6- are all positive”. 


The theory of B -stable methods renewed the interest in positive quadrature 
formulas and Burrage (1978) obtained the sharp bound 


(*-l) 2 

2 4(25 — l)(2s — 3) 


(13.1) 


for the positivity of the b i in the above case. This is the same as condition (5.51) in 
a different normalization. A short proof of this result (see “Lemma 18” of Nprsett 
& Wanner 1981) then led to a complete characterization of positive quadrature 
formulas by Sottas & Wanner (1982). An independent proof of an equivalent result 
was found by Peherstorfer (1981). In what follows, we give a new approach using 
continued fractions. 

Consider a quadrature formula 



i 


f(x)dx 
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with distinct nodes c • and non-zero weights b •. The main idea is to consider the 
rational function 


Q(*) = E b ; 


j =1 



N{z) 

M(z) 


(13.2) 


where, as usual, M(z) = (z — c 1 ) •... • (z — c s ) . We first express the order of the 
quadrature formula in terms of the function Q(z ). 


Lemma 13.1. A quadrature formula is of order p if and only if Q(z), defined by 
(13.2), satisfies 

Q(z) = -log (l- + °{f+i) f or z ^°°- (13.3) 


Proof Inserting the geometric series for (1 — c-jz)~ x into (13.2) we obtain 

«w=E(Em,*■')?• 

k> 1 j= 1 


Therefore (13.3) is equivalent to 

J2 b J C J~ 1 = l for k=l,...,p. □ 

j = 1 

We now study the case of the Gaussian quadrature formulas, where the func¬ 
tion (13.2) will be denoted by (z) = (z)/Mf(z) ; here the c • are the zeros 

of the 3 -degree shifted Legendre polynomial 

p - w = (S]i (13 ' 4) 

which is normalized so that the coefficient of z s is l. The polynomials (13.4) 
satisfy the recurrence relation (see Eq. (5.53) or Abramowitz & Stegun, p. 782) 

Ps+l{ z ) = ( z -\) P si z )- T s P s-l( z ), = (13 ' 5) 

and P 0 (z) = 1, P_ x (z) = 0. Since this quadrature formula is of optimal order 2s, 
it follows from (13.3) that 

N?{z) = -M?(z) log(l - i) +o(-l r ). (13.6) 

We now insert Mf(z) — P s (z) (see (13.2)) into (13.5) and multiply by log(l — 
1 /z) (which is 0(1/2:) for 2 : —> 00 ). A comparison with (13.6) shows that the poly¬ 
nomials N®(z) must also satisfy the recurrence formula (13.5) (with Nq(z) = 0, 
N^(z) = 1). It thus follows from elementary properties of continued fractions 
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(Exercise 1 or Perron (1913), page 4) that 


Q% 




For an arbitrary quadrature formula we have 


z 


1 ' 

2 


(13.7) 


Lemma 13.2. An irreducible rational function Q(z) — N (z) / M (z) (with deg M = 
s, deg N = s — 1) satisfies (13.3) with p > 2(5 — k), if and only if 


Q{*) 



with deg f — k and deg g < k — 1 . 


9{z) 1 

z ~\ |/( 2 ) 


(13.7’) 


Proof From Lemma 13.1 we know that Q(z) = Qf(z) + 0(1/z 2( < s ~ k ^ +1 ). There¬ 
fore the first 2 (s — k) coefficients in the continued fraction expansions for Q(z) 
and Q ( f(z) must be the same. □ 


Endlich sei noch die folgende Formel wegen ihrer haufigen An- 
wendungen ausdriicklich hervorgehoben: 

(O. Perron 1913, page 5) 


Lemma 13.3. The functions M(z) and N(z) of Lemma 13.2 are related to f(z) 
and g(z) of (13.7’) as follows: 

M{z) = P._ k (z)f(z) - (z)g(z), 

N(z) = N?_ k (z)f(z) - (z)g(z). 


Proof. This follows from the recursion (13.30) and Exercise 1 below, if we put there 
b o =0,b 1 =... = b s _ k = z- 1/2, b s _ k+1 = f(z ) and a x = 1, a } = = 

2,...,s-k),a s _ k+1 = -g(z). □ 


Solving the linear system (13.8) for f(z) and g(z) gives, with the use of Ex¬ 
ercise 2, 

f(z) • r x • • ■ r s _ k _ 1 = N{z)P t _ k _ k (z) - M(z)Nf_ k _ 1 (z) 
g(z) • r, • • • r..*., = N(z)P s _ k (z) - M{z)N?_ k {z). 

Number of Positive Weights 

For a given rational function (13.2), the weights are determined by 

h _ 

* 

But we want our theory to work also for confluent nodes for which 


(13.10) 
M'(c t ) = 0. 
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Therefore we suppose that c 1? ..., c m (m < s) are the real and distinct zeros of 
M(z) of multiplicities Z 1? ..., l m . Then we let 


N( Ci ) 

1 


i — 1,..., m. 


(13.10’) 


For /• = 1 this is just (13.10); otherwise we are considering the weights for the 
highest derivative of a Hermitian quadrature formula (see Exercise 3). 

The main idea (following Sottas & Wanner 1982) is now to consider the path 
7W — (/pl ane > where / and g are the polynomials of (13.7’). 
For t —»• ±oo this path tends to infinity with horizontal limiting directions, since the 
degree of / is higher than that of g . Equation (13.8) tells us that for an irreducible 
Q(z) this path does not pass through the origin. 


Definition 13.4. The rotation number r of 7 is the integer for which rn is the 
total angle of rotation around the origin for the path ^(t) (—00 <t < 00) measured 
in the negative (clockwise) sense. Counter-clockwise rotations are negative. 


An algebraic definition of r is possible as 

r = y^sign (/ (ii) (^)5(^)), 
i 


where the summation is over all real zeros of f(t) with odd multiplicity l •. 

Theorem 13.5 (Sottas & Wanner 1982). Let Q(z) = N(z)/M(z) be an irreducible 
rational function as in Lemma 13.2. Suppose that c l 5 ...,c m are the (distinct) 
real zeros of M(z) with odd multiplicity and denote by (respectively n_) the 
number of positive (respectively negative) b i . Further, let r be the rotation number 
0/7 == (f,g) (Definition 13.4). Then 

— n_ — s — k -f r. (13.11) 

Proof. The proof is by counting the number of crossings of the vectors 7 ( 2 ) = 
(f(t)^g(t)) and (3(t) = (P s _ k _ l (t), P 3 _ k (t)) , like the crossings of hands on a 
Swiss cuckoo clock. 

From (13.9) we see that when t equals a zero c- of M, these two vectors 
are parallel in the same sense (iV(c-) > 0) or in the opposite sense (W(c-) < 0 ). 
From (13.8) we observe that M(t) is just the exterior product 7 (t) x (3(t). By 
elementary geometry, and taking into account Formula (13.10’), we see that at 
every zero c • with odd multiplicity we have 

i) b i > 0, if the crossing of 7 (t) with (3(t) is clockwise; 

ii) b i < 0 , if this crossing is counter-clockwise. 

Zeros of M(t) with even multiplicity don’t give rise to crossings. 

Since the zeros of P 3 _ k and P 3 _ k _ x interlace (see e.g. Theorem 3.3.2 of 
Szego 1939), the vector (3(t) turns counter-clockwise with a total angle of — (s — 
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Fig. 13.1. The path (P s _&_!(£), P s _&(£)) for 5 — k = 7 

k)7r (see Fig. 13.1). The vector 7 (t) turns with a total angle vt: measured clock¬ 
wise (Definition 13.4). Since the limiting directions of 7 (t) and (3(t) are different 
(horizontal for 7 (f) and vertical for /?(£)), 7 (f) must cross /3(t), as t increases 
from — 00 to + 00 , exactly s — k + r times more often clockwise than counter¬ 
clockwise. This gives Formula (13.11). □ 


Corollary 13.6. Under the assumptions of Theorem 13.5, all zeros of M(z) are 
real and simple, and the h i are positive if and only if 

r — k. 

Proof r = k means by (13.11) that n + — n_ = s . Because of n_ > 0 and n + < s , 
this is equivalent to n + = s and n_ — 0 . □ 

Characterization of Positive Quadrature Formulas 

The following theorem gives a constructive characterization of all quadrature for¬ 
mulas with positive weights. 

Theorem 13.7. Let 


G X < Ql <(J 2 < Q 2 < ...< < d k 

be arbitrary real numbers and C a positive constant. Then, putting 

f(z) = (z-a 1 )...(z-<j k ), g(z) = C(z-Q 1 )...(z-Q k _ 1 ), (13.12) 

computing M(z), N(z ) from (13.8), taking c 1? . .., c 3 as the zeros of M(z) and 
b i from (13.10), one obtains all quadrature formulas with positive weights of order 
P > 2(5 — k). If C = r s _ k the order is p > 2(.s — k) + 1 . 

Proof. The functions f(z) and g(z) are irreducible, so that also the fraction 
N(z)/M(z) is irreducible by (13.9). The statement now follows from Corol¬ 
lary 13.6, since the polynomials (13.12) are all possible polynomials for which 
r = k. The stated order properties follow from Lemma 13.2. □ 
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Example 13.8. Let c x ,..., c s be the zeros of 

M(z) = P s (z) + o: 1 P a _ 1 (z) + a 2 P s _ 2 (z). (13.13) 

In order to study when the corresponding quadrature formula has positive weights, 
we use (13.5) to write (13.13) as 

M(z) =P s _ 1 (z)(z-^ + a 1 ) -P s _ 2 (z)(r s _ 1 -a 2 ). 

Consequently f(z) = z — 1/2 + a 1 , g(z) = r s _ 1 — a 2 and Theorem 13.7 implies 
that the zeros of M(z) are real and the weights positive, if and only if a 2 < r 3 _ 1 , 
hence (13.1) is proved. 

For k > 1 the rotation number r of (f(t),g(t)) can be computed with Sturm’s 
algorithm (Lemma 13.3 of Sect. 1.13). Consider, for example, 

M ( Z ) = P s ( Z ) + a l P s-l( Z ) + a 2 P s-2 00 + a 3 P s- 3 ( Z ) 

= P s-2( Z ) [i Z ~ \){ Z - | + «l) + «2 - T s- 1] 

- P s-3 ( Z ) [^-2^- |+ Q l)- a 3 ]- 

Application of Lemma 1.13.3 to the polynomials f(z) = (z — \){z — \ + a x ) + 
a 2 — r s _ 1 and g(z) = r s _ 2 (z — | + a x ) - a 3 shows that the corresponding quadra¬ 
ture formula has positive weights iff 

— («i " —) - « 2 + > 0, (13-14) 

T s -2 ^ T s-2' 

a result first found by Burrage (1978). 


Necessary Conditions for Algebraic Stability 


We now turn our attention to algebraic stability. We again use the notation B(p ), 
C(ry), £>(£) of Sect. IV.5. 

Lemma 13.9 (Burrage 1982). Consider Runge-Kutta methods, which satisfy B{ 2) 
and the second condition for algebraic stability (i.e. M non-negative). Then, 

a) C(k) implies B(2k — 1); 

b) D(k) implies B(2k — 1). 


Proof Instead of considering M, we work with the transformed matrix M = 
V T MV where V = (c J i ~ 1 )f j =1 is the Vandermonde matrix. The elements of M 
are given by 


m 


qr 




"X> 

j=1 


c r_ 
U 3 


■+E*. 


A 'E *.‘ c 


y - 1 


Y. i A~"Ef’Ar‘■ 


3 = 1 


i= 1 


J = 1 


(13.15) 
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We further introduce 

;=i 

so that B(u) is equivalent to g r — 0 (r = 1,..., v) . Then C(fc) simplifies (13.15) 
to 

™,r = ^ (ffg+r + 1 ~ idq + l)(flV + !)) q<k,r<k. 

Similarly, D(k) implies 

fh q r=-^( 9 q +r+ 9 q ' 9 r) q<k,r<k. 

We now start with the hypothesis B(2 ), i.e., B(2l) for l = 1. This means that 
g 1 = ... = g 2l = 0, so that, in both cases, fh u — 0. But if for a non-negative definite 
matrix a diagonal element is zero, the whole corresponding column must also be 
zero (see Exercise 11 of Sect.IV.12). This leads to g l+q = 0 for q = 1,..., fc; 
so we have B(k + l). We then repeat the argument inductively until we arrive at 
B(2k-l). □ 


Since 5 -stage collocation methods satisfy B(s) and C(s) (see Theorem 7.8 
of Chapter II) we have 

Corollary 13.10 (Burrage 1978). An s-stage algebraically stable collocation me¬ 
thod must be of order at least 2s — 1. □ 


Because symmetric methods have even order this gives: 

Corollary 13.11 (Ascher & Bader 1986). A symmetric algebraically stable collo¬ 
cation scheme has to be at Gaussian points. □ 


The next result states the necessity of the simplifying assumption C(k). Ob¬ 
serve that by Theorem 12.16 the weights 6- of D J -irreducible, algebraically stable 
methods have to be positive. 

Lemma 13.12. If a Runge-Kutta method of order p>2k + l satisfies b i > 0 for 
i = 1 ,..., 5 , then the condition C(k) holds. 

Proof (Dahlquist & Jeltsch (1979) attribute this idea to Butcher). The order condi¬ 
tions (see Sect. II.2) 


1 



208 IV. Stiff Problems — One-Step Methods 


53 V'k/'I 1 

i,j= 1 


1 

(2q+l)q 


E b d ■ ^ ci ■ c* ^ 

i ij J u 2m l m 


1 

(W 


imply that 



q — 1 

a ij C j 




for 2<? + 1 < p. Since the b i are positive, the individual terms of this sum must be 
zero for q < k . □ 


A simple consequence of this lemma are the following order barriers for di¬ 
agonally implicit DIRK (a - = 0 for i < j) and singly diagonally implicit SDIRK 
(aj ■ = 0 for i < j and a u = 7 for all i) methods. 

Theorem 13.13 (Hairer 1980). 

a) A DIRK method with all b i positive has order at most 6; 

b) An SDIRK method with all b • positive has order at most 4; 

c) An algebraically stable DIRK method has order at most 4. 


Proof, a) Suppose the order is greater than 6 and let i be the smallest index such 
that c i 7 ^ 0. Then by Lemma 13.12 

r 2 r 3 

a.. C - = -£■ a-c 2 — — 

2 ’ ^ll^l ~~ 0 ? 

contradicting c • 7 ^ 0. 

b) As above, we arrive for order greater than 4 at 


or a ti = |-(^ 0 ). 

Since for SDIRK methods we have a ti = a n , this leads to c 1 = a n 7 ^ 0, hence 
i = 1. Now a u — c 1 j 2 contradicts a n — < 7 . 

c) It is sufficient to consider D J -irreducible methods, since the reduction pro¬ 
cess (see Table 12.3) leaves the class of DIRK methods invariant. From Theo¬ 
rem 12.16 and Lemma 13.12 we obtain that algebraic stability and order greater 
than 4 imply 


and hence a xl — 0. Inserted into this yields m n = - 6?<0 , contradicting 
the non-negativity of the matrix M . □ 
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Similarly to Lemma 13.12 we have the following result for the second type of 
simplifying assumptions. 

Lemma 13.14. If a Runge-Kutta method of order p>2k + l is algebraically stable 
and satisfies bi> 0 for all i , then the condition D(k) holds . 

Proof The main idea is to use the W -transformation of Sect. IV.5 and to consider 
W T MW instead of M (see also the proof of Theorem 12.8). By Theorem 5.14 
there exists a matrix W satisfying T(k, k) (see Definition 5.10). With the help of 
Lemma 13.12 and Theorem 5.1 la we obtain that the first k diagonal elements of 

W t MW = (W t BW)X + X t (W t BW) t - cjcf (13.16) 

are zero. Since M and hence also W T MW is non-negative definite, the first k 
columns and rows of W T MW have to vanish. Thus the matrix (W T BW)X must 
be skew-symmetric in these regions (with exception of the first element). Because 
of C(k) the first k columns and rows of ( W T BW)X and X are identical. Thus 
the result follows from Theorem 5.11. □ 


Characterization of Algebraically Stable Methods 


Theorem 12.16, Lemma 13.12 and Lemma 13.14 imply that DJ -irreducible and 
algebraically stable RK-methods of order p > 2k + 1 satisfy K > 0 for all i , and 
the simplifying assumptions C(k) and D(k). These properties allow the following 
constructive characterization of all irreducible B -stable RK-methods. 


Theorem 13.15 (Hairer & Wanner 1981). Consider a pth order quadrature for¬ 
mula with positive weights and let W satisfy Property T(k,k) of Def¬ 

inition 5.10 with k = [(p — l)/2]. Then all pth order algebraically stable Runge- 
Kutta methods corresponding to this quadrature formula are given by 


A = WXW~ 


(13.17) 


where 


/0 -*! 


(' W t BW)X = 2 e i e T + 


0 




Q 


(13.18) 


and Q is an arbitrary matrix of dimension s — k for which Q + Q T is non-negative 
definite. For p even we have to require that q u =0. 


Proof. Algebraic stability and the positivity of the weights b i imply C(k) and 
D(k) with k = [(p — l)/2]. The matrix A of such a method can be written as 
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(13.17) with X given by (13.18). This follows from Theorem 5.11 and the fact 
that multiplication with W T BW does not change the first k columns and rows 
of X . This method is algebraically stable iff M (or W T MW ) is non-negative 
definite. By (13.16) this means that Q + Q T is non-negative definite. 

Conversely, any Runge-Kutta method given by (13.17), (13.18) with Q + Q T 
non-negative definite is algebraically stable and satisfies C(k) and D(k). There¬ 
fore it follows from Theorem 5.1 in the case of odd p = 2k +1 that the Runge-Kutta 
method is of order p. 

If p is even, say p = 2k + 2, the situation is slightly more complicated. Because 
of 

s 

011 = E b i p k( c i) a ij p k( c j) 
hj = 1 

it follows from B(2k + 2), C(k ), D(k) that the order condition (13.19) below 
(with £ = rj = k) is equivalent to q n =0. The stated order p of the RK-method 
now follows from Lemma 13.16. □ 


In the above proof we used the following modification of Theorem 5.1. 


Lemma 13.16. If the coefficients b i ,c i , a- of an RK-method satisfy 


EVKS" {r] + ( + 2)(r,+ l) 

l tJ — 1 


(13.19) 


and B(p), C(i 7 ), D(£) with p < rj + £ + 2 and p < 2rj + 2 , then the method is of 
order p . 


Proof The reduction process with the help of (7( 77 ) and D(£) as described in 
Sect. II.7 (Volume I) reduces all trees to the bushy trees covered by B(p). The 
only exception is the tree corresponding to order condition (13.19). □ 


Example 13.17 (Three-stage B -stable SIRK methods). Choose a third order qua¬ 
drature formula with positive weights and let W satisfy W T BW = /. Then 
(13.18) becomes 




I -ti 0 

Ci a b 
0 c d 




1 

2yf' 


The method is B -stable if X T + X — e 1 ej is non-negative, i.e. if 

«>0, d> 0, 4 ad>(c + b) 2 . (13.20) 

If we want this method to be singly-implicit, we must have for the characteristic 
polynomial of A 

Xa (z) = (1 - 7 zf = 1 - 3 7 2 + 3 7 V - l 3 z z . 
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This means that (see (13.17)) 

- + a + d = 37 
aid . , o 

2 + K + 2 + “‘- cb = Zl 


Some elementary algebra shows that these equations can be solved and the inequal¬ 
ities (13.20) satisfied if 1/3 < 7 < 1.06857902, i.e., exactly if the corresponding 
rational approximation is A-stable (cf. Table 6.3; see also Hairer & Wanner (1981), 
where the analogous case with s =p = 5 is treated). 


The “Equivalence” of A - and B -Stability 


Many A-stable RK-methods are not B -stable (e.g., the trapezoidal rule, the Lo- 
batto IIIA and Lobatto IIIB methods; see Theorem 12.12). On the other hand 
there is the famous result of Dahlquist (1978), saying that every A-stable one-leg- 
method is B -stable, which we shall prove in Sect. V.6. We have further seen in 
Example 13.17 that for a certain class of A-stable methods there is always a in¬ 
stable method with the same stability function. The general truth of this result was 
conjectured for many years and is as follows: 

Theorem 13.18 (Hairer & Tiirke 1984, Hairer 1986). Let R(z) = P(z)/Q(z ) 
(P(0) = Q(0) = 1, degP < s, deg Q = s) be an irreducible, A-stable function 
satisfying R(z) — e z = 0(zP +1 ) for some p > 1. Then there exists an s-stage 
B -stable Runge-Kutta method of order p with R(z) as stability function. 


Proof Since R{z) is an approximation to e z of order p, it can be written in the 
form 


R(z) 


l + WO 



tlz 2 

I 1 


+ ...+ 



+ $***(*) 

(13.21) 


where k = [(p— 1)/2], = l/(4(4) 2 — 1)) and ^ k (z) = zg(z)/f(z) with 5 ( 0 ) = 
/(0) = 1, deg / <s — k, deg g<s—k— 1 (for p even we have in addition g f ( 0) = 
/'(0)). For the diagonal Pade-approximation R G (z) of order 2s this follows from 
Theorem 5.18 with v — s — 1 and = z: 


R g (z) 


1 + |* G (*) 
1- 1$G( 2 )> 


z \ P 2 *2| 

^> G ( Z ) = J + p—+ ,.. + p-3 1 . (13.22) 


For an arbitrary R(z) (satisfying the assumptions of the theorem) this is then a con¬ 
sequence of R(z) = R G (z) + 0(zP +1 ), or equivalently ^(z) = ^ G (z) + 0(zP +1 ). 
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The function R(z) of (13.21) is A-stable iff (Theorem 5.22) 

ReT f fc (z)<0 for Rez<0. 

Therefore, the function x{ z ) = —^*.(— 1 / 2 ) is positive (c.f. Definition 5.19) and 
by Lemma 13.19 below there exists an (s — k ) -dimensional matrix Q such that 

x(z) = e[(Q + zlff 1 e 1 and Q + Q T non-negative definite. 

We now fix an arbitrary quadrature formula of order p with positive weights 
b • and (for the sake of simplicity) distinct nodes c i . We let W be a matrix sat¬ 
isfying W T BW = I and Property T(k,k) with k = [(p — l)/2] (c.f. Lemma 
5.12), and define the Runge-Kutta coefficients (a-) by (13.17) and (13.18). This 
Runge-Kutta method is algebraically stable, because Q + Q T is non-negative def¬ 
inite and of order p (observe that g f ( 0) = /'(0) implies that the upper left element 
of Q vanishes). Finally, it follows from Theorem 5.18 and ^ k {z) = —x(—1 /z) = 
ze [(/ — zQ)~ x e 1 that its stability function is R(z ). □ 

It remains to prove the following lemma. 

Lemma 13.19. Let x{ z ) — a ( z )/fl{ z ) be an irreducible rational function with real 
polynomials 

a(z) = z n -j- z n 2 + ..., fl{z) — z n + f^-^z 71 1 + .... (13.23) 

Then x{ z ) ^ a positive function iff there exists an n-dimensional real matrix Q, 
such that 

x(z) = eJ(Q + zl)~ l e 1 and Q + Q T non-negative definite. (13.24) 

Proof a) The sufficiency follows from 

Rex(^) = q{z )*{Rez • I + \(Q + Q T )}q(z) 

with q(z) = (Q- f- zl)~ l e 1 , since Q + Q T is non-negative definite. 

b) For the proof of necessity , the hard part, we use Lemma 6.8 of Sect. V.6 
below. This lemma is the essential ingredient for Dahlquist’s equivalence result 
and will be proved in the chapter on multistep methods. It states that the positivity 
of x{ z ) is equivalent to the existence of real, symmetric and non-negative definite 
matrices A and B , such that for arbitrary z,w e C (z = (z n_1 ,..., z, 1) T , w = 
1) T ), 

a(z)(3(w) + a{w)f3{z) = (z + w)z T Aw + z T Bw. (13.25) 

The matrix A is positive definite, if a(z) and /3(z) are relatively prime. 
Comparing the coefficients of w n in (13.25) we get 

a(z) = z T Ae 1 (13.26) 

and observe that the first column of A consists of the coefficients of a(z ). For the 
Cholesky decomposition of A, A = U T U (U is an upper triangular matrix) we 
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thus have Ue x — e 1 . We next consider the possible computation of the matrix Q 
from the relation 

('Q + zI)Uz=/3(z)-e 1 (13.27) 

or equivalently 

QUz = j3(z) • e 1 — zUz. (13.28) 


The right-hand side of (13.28) is a known polynomial of degree n — 1, since Ue 1 = 
e 1 . Therefore, a comparison of the coefficients in (13.28) yields the matrix QU and 
hence also Q . It remains to prove that this matrix Q satisfies (13.24). 

Using (13.27), the formula Ae 1 = U T Ue 1 = U T e 1 and (13.26) we obtain 

e\\Q + zl)~ 1 e 1 • /3(z) = eJUz = A T z = a(z), (13.29) 

which verifies the first relation of (13.24). Further, from (13.27) and a(z) = e^Uz 
we get 

z t U t (Q + wI)Uw = a(z)/3(w). 


Inserting this formula and the analogous one (with 2 and w exchanged) into 
(13.25) yields 0 = z T (B - U T (Q + Q T )U)w , so that B = U T (Q + Q T )U . This 
verifies the second relation of (13.24), since B is symmetric and non-negative def¬ 
inite. □ 


Exercises 


1. (Perron (1913) attributes this result to Wallis, Arithmetica infinitorum 1655 
and Euler 1737). Let the sequences {A k } and {B k } be given by 


Ak ~ bk^k-i 


A-1 — 1? A) — 

(13.30) 

B k — b k B k -1 

+ a Jt-®k-2> 

B—\ — 0, B 0 — 1 

then 

A„ , 

a i 1 1 



B. ‘" + 

-U + + _IL! 

l*i K' 

(13.31) 

Hint. Let x = 
where 


be the solution of Mx = (0,.. 



M = 


b\ a 2 
1 


J n — 1 


— b 

1 / 


One easily finds 


ZL-b | Q i 1 - b +^l + 

— °o + I / — °o ‘ I . ' 


c i/ 


h 


/ *^3 
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so that x 0 /x 1 is equal to the right hand side of (13.31). The statement now 
follows from the fact that 

(A. 1 ,4 o ,...,i„)M = (l,0 r ..,0) 

..., B n )M = (0,1,0,... ,0). 
implying x 0 = A n and x 1 =B n . 

2. Let P s (z) be the Legendre polynomial (13.4) and N&(z) defined by the re¬ 
cursion (13.5) with Nq(z ) = 0, N^(z) = 1. Prove that 

N ?-ki z ) P 8-k-i (z) - N ?-k- 1 ( z ) p s-k( z ) = T 1 -T 2 -...- 

Hint. Use the relation 

( N%(z) P m {z)\ = (z-\ -r m _ x \ ( Ng_ x {z) P m _ x {z)\ 
\N%- X (z) V 1 0 J\Ng_ 2 (z) P m _ 2 (z)J ' 


3. Consider the Hermitian quadrature formula 

J^ f{x)dx=b 1 f(c 1 ) + af(c 2 )+(3 + 7 ^ ^ ■ (13.32) 

Replace f'(c 2 ) and /"(c 2 ) by finite divided differences based on /(c 2 — e ), 
/(c 2 ), f(c 2 + e) to obtain a quadrature formula 

f f(x)dx = b 1 f(c 1 ) + b 2 f(c 2 -e)+b 3 f(c 2 ) + bj(c 2 +e). (13.33) 

Jo 


a) Compute Q(z) for Formula (13.33) and obtain, by letting e -» 0, an expres¬ 
sion which generalizes (13.2) to Hermitian quadrature formulas. 

b) Compute the values of b x and b 2 (Z x = 1, l 2 = 3) of (13.10’). 


c) Show that n + — n_ (see Theorem 13.5) is the same for (13.32) and (13.33) 
with e sufficiently small. 


Results. 


a) 

b) 


Q(z) 


b i 


■ + ■ 


a 


+ 


(3 


+ • 


(z-c 2 ) 2 (z-C 2 ) 


b 1 =b 1 (sic!), b 2 = 7 / 3 !. 


4. The rational function x( z ) — a { z )/P{ z ) with a(z) = z + a l9 /3(z) — z 2 -\- 
/3 1 z + (3 2 is positive, iff a x > 0 , (3 2 > 0 , (3 X — a x > 0 (compare (5.48)) 

a) Find real, symmetric and non-negative definite matrices A and B such that 
(13.25) holds. 

b) Show that these matrices are, in general, not unique. 

c) As in the proof of Lemma 13.19, compute the matrix Q such that (13.24) 
holds. 

Hint. Begin with the construction of B by putting w = —z in (13.25). 
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Jusqu’a present, nous avons suppose que le schema admettait une 
solution. Pour en demontrer 1’existence ... 

(Crouzeix & Raviart 1980) 

Since contractivity without feasibility makes little sense ... 

(M.N. Spijker 1985) 


Since the Runge-Kutta methods studied in the foregoing sections are all implicit, 
we have to ensure that the numerical solutions, for which we have derived so many 
nice results, also really exist. The existence theory for implicit Runge-Kutta meth¬ 
ods, presented in Volume I (Theorem II.7.2), is for the non-stiff case only, where 
hL is small (L the Lipschitz constant). This is not a reasonable assumption for the 
stiff case. 

We shall study the existence of a Runge-Kutta solution, defined implicitly by 

s 

9i = yo + h ^2 a ijf( x o+ c j h ’9j), (14.1a) 

3 -1 
s 

Vi =y 0 + h 'Yl b jf ( x 0 + Cjh,g j ), (14.1b) 

3 = 1 

for differential equations which satisfy the one-sided Lipschitz condition 

(f( x , y) - fi x , z), y-z}< v\\y - z\\ 2 . (14.2) 


Existence 

It was first pointed out by Crouzeix & Raviart (1980) that the coercivity of the 
Runge-Kutta matrix A (or of its inverse) plays an important role for the proof of 
existence. 

Definition 14.1. We consider the inner product (u,v) D = u T Dv , where D = 
diag(dj ,... ,d s ) with ^ > 0. We then denote by a D (A _1 ) the largest number a 
such that 

(u, D > a (u,u) D for all u G R s . (14.3) 

We also set 

a 0 ( J 4- 1 )=supa D (A- 1 ). (14.4) 

D> 0 

The first existence results for the above problem were given by Crouzeix & 
Raviart (1980), Dekker (1982) and Crouzeix, Hundsdorfer & Spijker (1983). Their 
results can be summarized as follows: 
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Theorem 14.2. Let f be continuously differentiable and satisfy (14.2). If the 
Runge-Kutta matrix A is invertible and 

hv<a 0 (A~ 1 ) (14.5) 

then the nonlinear system (14.1a) possesses a solution (g x ,..., g 3 ). 

Proof. The original proofs are based on the “uniform monotonicity theorem” or on 
similar results. We present here a more elementary version which, however, has 
the disadvantage of requiring the differentiability hypothesis for /. The idea is to 
consider the homotopy 

s s 

9i=y 0 + hy2 a ijf( x 0 + c i h ’9j) + ( T - l)fc Ys a ijf i x o + c j h > 2 / 0)5 (14.6) 

3 = 1 i =1 

which is constructed in such a way that for r = 0 the system (14.6) has the solution 
g. = y 0 , and for r = 1 it is equivalent to (14.1a). We consider g i as functions of r 
and differentiate (14.6) with respect to this parameter. This gives 

S ^ S 

9i = h J2 a H^ x o + C A 9j) ‘ S; + h £ a {j f{x 0 + Cjh, y 0 ) 

3 = 1 y 3 = 1 

or equivalently 

(• I-h{A®I){f y })g = h{A®I)f 0 (14.7) 

where we have used the notations 

9 = (9i,---, 9 s) T 5 fo = (fi x o + Cjft, 2 / 0)5 • • •, fi x o + c s h, y 0 )) T 

(more precisely, g should be written as (gf, ..., gJ) T ) and 

{f y } = blockdiag (^( x o + c i h ,9i), • • •, ^( x o + c s h > 9 a j) ■ 

In order to show that g can be expressed as g = G(g) with a globally bounded 
G(g), we take a D satisfying hu < a D (A _1 ), multiply (14.7) by g T (DA~ 1 ® /) 
and so obtain 

g T (Z?A- 1 0 1)g - hg T (D 0 J){/ y }<? = hg T (D <g> 1)f 0 . (14.8) 

We now estimate the three individual terms of this equation. 

TV\p pctiTTfifitp 

g T {DA-'®I)g > a D (A~ ||| 5 ||| 2 D , (14.9) 

where we have introduced the notation \\\g\\\ 2 D = g T (D ® I)g, is (14.3) in the case 
of scalar differential equations (absence of “®J”). In the general case we must 
apply the ideas of Exercise 1 of Sect. IV. 12 to the matrix | (DA- 1 + (DA- 1 ) T ) — 
a D (A~ 1 )D , which is non-negative definite by Definition 14.1. It follows from 
(14.2) with y = z + eu that 
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Dividing by e 2 and taking the limit e 0 we obtain (u, f^(^, z)u) < v\\u\\ 2 for 
all (x,z) and all u . Consequently we also have 

g T {D®I){f y }g<v\M 2 D- (14.10) 

The right-hand term of (14.8) is bounded by /i|||#||| D - |||/ 0 ||| D by the Cauchy- 
Schwarz-Bunjakowski inequality. 

Inserting these three estimates into (14.8) yields 

{a D {A-')-hv) III^IIId < h III^IIId- III/oIIId- 


This proves that g can be written as g = G(g) with 


II|G(<7)||| d < 


a D (A _1 ) — hv 


It now follows from Theorem 7.4 (Sect. 1.7) that this differential equation with 
initial values g-(O) = y 0 possesses a solution for all r, in particular also for r = 1. 
This proves the existence of a solution of (14.1a). □ 


Remark. It has recently been shown by Kraaijevanger & Schneid (1991, Theo¬ 
rem 2.12) that Condition (14.5) is “essentially optimal”. 


A Counterexample 


After our discussion that Monday afternoon (October 1980) I 
went for a walk and I got the idea for the counterexample. 

(M.N. Spijker) 

The inequality in (14.5) is strict , therefore Theorem 14.2 (together with Exercise 1 
below) does not yet answer the simple question: “does a B -stable method on a 
contractive problem {y — 0) always admit a solution”. A first counterexample to 
this statement has been given by Crouzeix, Hundsdorfer & Spijker (1983). An easy 
idea for constructing another counterexample is to use the W -transformation (see 
Sections IV.5 and IV. 13) as follows: 

We put s = 4 and take a quadrature formula with positive weights, say, 

(cj = (0,1/3,2/3,1), (b t ) = (1/8,3/8,3/8,1/8). 

We then construct a matrix W satisfying property T(l,l) according to Lem¬ 
ma 5.12. This yields for the above quadrature formula 

/I -x/3 ^3 -1\ 

1 -\/3/3 -73/3 1 | 

1 73/3 -73/3 -1 ' 

\1 73 73 1/ 


W = 
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Finally, we put (with = 1/(2\/3)) 


A = WXW~ 1 with X = 


1/2 0 0 \ 
^000 
0 0 0 -/? 

0 0/30/ 


For j3 = l/(4v / 3) this gives nice rational coefficients for the RK-matrix, namely 


1 




48 


3 0 3 -6\ 

6 9 0 1 | 

5 18 9 0 I 

12 15 18 3 / 


It follows from Theorem 13.15 that this method is algebraically stable and of or¬ 
der 4. However, ±if3 is an eigenvalue pair of X and hence also of A . 

We thus choose the differential equation 

y' = Jy + f{x ) with 0^)’ 

which satisfies (14.2) with v — 0 independent of the choice of f(x) . If we apply the 
above method with h — 1 to this problem and initial values x 0 = 0, y 0 = (0,0) T , 
Eq. (14.1a) becomes equivalent to the linear system 

(■ I~A®J)g = {A®I)f 0 , 

where g = (g 1 ,... ,g i ) T and f 0 = (/(q),... ,/(c 4 )) T . The matrix (/-A® J) 
is singular because the eigenvalues of I — A 0 J are just 1 — A/i where A and (i 
are the eigenvalues of A and J , respectively. However, A is regular, therefore it is 
possible to choose f(x) in such a way that this equation does not have a solution. 


Influence of Perturbations and Uniqueness 


Our next problem is the question, how perturbations in the Runge-Kutta equations 
influence the numerical solution. Research into this problem was initiated inde¬ 
pendently by Frank, Schneid & Ueberhuber (preprint 1981, published 1985) and 
Dekker (1982). 

As above, we use the notations 

\\u\\ D = \Ju T Du = yj\u,u) D u E R s 

IIMIId = \J g T {D ® I)g 

and || A|| d for the corresponding matrix norm. 


g e R sn 
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Theorem 14.3 (Dekker 1984). Let g i and y 1 be given by (14.1) and consider 
perturbed values g • and y l satisfying 

s 

9i = Vo + h Yl a ijf( x o + c j h ’ 9j ) + S i (14.1 la) 

j =i 

Vi = Vo + h ^ h jf{ x o + c j h i9j)- (14.11b) 

3 = 1 

If the Runge-Kutta matrix A is invertible , if the one-sided Lipschitz condition (14.2) 
is satisfied , and hv <a D (A~ 1 ) for some positive diagonal matrix D , then we have 
the estimates 

\\Vi-yi\\ <\\b T A 1 ||D^+ a (14.13) 

where g = (g^... ,g s ) T , g = (g 1 ,... ,g s ) T , and 8 = (6 1 ,... ,8 S ) T . 

Proof. With the notation A g — g — g and 

M = (/0 o +c 1 Kg 1 )- f(x 0 + Cl h,gf),..., f(x 0 + c s h, g s ) - f(x 0 + c 3 h, g s )^j 

the difference of (14.1 la) and (14.1a) can be written as 

Ag = h{A®I)Af + S. 

As in the proof of Theorem 14.2 we multiply this equation by A g T (DA~ 1 ® I) 
and obtain 

A g T (DA~ 1 ® 1)Ag - hAg T (D <g> I)Af = Ag T (DA~ 1 0 1)8. (14.14) 

This equation is very similar to Eq. (14.8) and we estimate it in the same way: since 
D is a diagonal matrix with positive entries, it follows from (14.2) that 

Ag T (D®I)Af < p\\\Ag\f D . (14.15) 

Inserting (14.15) and (14.9) (with g replaced by A g) into (14.14) we get 

(a D {A-')-hv) UlAsfe < IIIA^^IIKA- 1 ®/)^^ 

which implies (14.12). The estimate (14.13) then follows immediately from 

2/i - 2/i = h(b T ® I)Af = (6 t A _ 1 <g> I){Ag - <5). □ 

Putting 8 — 0 in Theorem 14.3 we get the following uniqueness result. 


Theorem 14.4. Consider a differential equation satisfying (14.2). If the Runge- 
Kutta matrix A is invertible and hv < a 0 (A -1 ), then the system (14.1a) possesses 
at most one solution. □ 
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Computation of a 0 (A ~ x ) 

... the determination of a suitable matrix D ... This task does 
not seem easy at first glance ... (K. Dekker 1984) 

The value a D (A _1 ) of Definition 14.1 is the smallest eigenvalue of the sym¬ 
metric matrix ( D 1 / 2 A~ 1 D ~ 1 / 2 + ( D 1 / 2 A~ 1 D~ 1 / 2 ) T )/2 . The computation of 
a 0 (A -1 ) is more difficult, because the optimal D is not known in general. 

An upper bound for a 0 (A -1 ) is 

<*o04 _1 ) < . min u- (14.16) 

Z=1..,S 

where uj i - are the entries of A -1 . This follows from (14.3) by putting u = e i , the 
i th unit vector. 

Lower bounds for a 0 ( A -1 ) were first given by Frank, Schneid & Ueberhuber 
in 1981. Following are the exact values due to Dekker (1984), Dekker & Verwer 
(1984, p. 55-164), and Dekker & Hairer (1985) (see also Liu & Kraaijevanger 1988 
and Kraaijevanger & Schneid 1991). 

Theorem 14.5. For the methods of Sect. IV. 5 we have: 


Gauss 

7 

o 

8 

1 

= mm ^ . 

*‘=i,..,s 2c-(l - 

~CiV 



ji x 

if 6 = 1, 

Radau IA 

1 

S 

o 

8 

l 2(1-c 2 ) 

if «s > 1, 



fl , 

ifs = 1, 

Radau IIA 

a 0 (A x ) 

'= \ 

if «s > 1, 



V ^ C S -1 

Lobatto IIIC 

7 

o 

8 

■={; 

if s = 2, 
if 6 > 2. 


Proof, a) Gauss methods: written out in “symmetricized form”, estimate (14.3) 
reads 

- u T (DA -1 + [DA~ l ) T ^u > au T Du. 

Evidently the sharpest estimates come out if D is such that the left-hand ma¬ 
trix is as “close to diagonal as possible”. After many numerical computations, 
Dekker had the nice surprise that with the choice D = 5(C _1 — I) , where B = 
diag (6 X ,..., b s ) and C — diag (c x , ..., c s ) , the matrix 

DA- 1 +{DA~ 1 ) T = BC- 2 (14.17) 


becomes completely diagonal. Then the optimal a is simply obtained by testing 
the unit vectors u — e k , which gives 
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It remains to prove (14.17): we verify the equivalent formula 

V t {A t D + DA — A t BC~ 2 A)V = 0 (14.18) 

where V = (cj -1 ) is the Vandermonde matrix. The (/, m) -element of the matrix 
(14.18) is 




a-c' _1 c m_1 

a iJ C l C J 


-E‘ 

hj,k 


— a r l ~ 1 n r m ~ 1 
i c 2 a ik C k a ij C j 


(14.19) 


With the help of the simplifying assumptions C(s) and B(2s) the expression 
(14.19) can be seen to be zero. 

b) For the Radau IA methods we take D = B(I — C) and show that 

DA- 1 +(DA- 1 ) T = B + e ie [. (14.20) 

The stated formula for o 0 ( ,T _1 ) then follows from 0 = c x < c 2 < ... < c s and 
from 

b i +1 > 1 

&i “ 1 - c 2 ’ 

which is a simple consequence of b x — l/s 2 (see Abramowitz & Stegun (1964), 
Formula 25.4.31). For the verification of (14.20) one shows that V T (DA ~ 1 + 
(DA -1 ) T — B — e 1 e[)V = 0. Helpful formulas for this verification are A -1 Ve 1 = 
k 1 e 1 , V T e 1 = e 1 and A~ x Ve- = (j — l)V’e J -_ 1 for j > 2. 

c) Similarly, the statement for the Radau IIA methods follows with D — BC -1 
from the identity 

DA- 1 + {DA- 1 f = BC ~ 2 + e s e T s . 

d) As in part (b) one proves for the Lobatto IIIC methods that 

BA + (BA 1 ej 1 + (14.21) 

Since this matrix is diagonal, we obtain a 0 (A -1 ) = 1 for 3 = 2 and a 0 (A -1 ) = 0 
for 3 > 2. □ 


For diagonally implicit Runge-Kutta methods we have the following result. 

Theorem 14.6 (Montijano 1983). For a DIRK-method with positive a i{ we have 

Cf 0 (A -1 ) = min —. (14.22) 

a u 

Proof. With D = diag (1, e 2 , e 4 ,..., € 2s ~ 2 ) we obtain 

D 1 / 2 A- 1 D~ 1 / 2 + (D 1 / 2 A~ 1 D~ 1 / 2 ) t = diag (a- 1 ,...,«;/) + 0(e), 

so that a 0 (A -1 ) > min- a^ 1 + O(e ). This inequality for e —> 0 and (14.16) prove 
the statement. □ 
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Methods with Singular A 


For the Lobatto IIIA methods the first stage is explicit (the first row of A vanishes) 
and for the Lobatto IIIB methods the last stage is explicit (the last column of A 
vanishes). For these methods the Runge-Kutta matrix is of the form 


A = 


0 0 
a A 


or 


A 

nT 


(14.23) 


and we have the following variant of Theorem 14.2. 


Theorem 14.7. Let f be continuously differentiable and satisfy (14.2). If the 
Runge-Kutta matrix is given by one of the matrices in (14.23) with invertible A, 
then the assumption 

hv < a 0 (A“ 1 ) 

implies that the nonlinear system (14.1a) has a solution. 

Proof. The explicit stage poses no problem for the existence of a solution. To 
obtain the result we repeat the proof of Theorem 14.2 for the s — 1 implicit stages 
(i.e., A is replaced by A and the inhomogenity in (14.6) may be different). □ 


An explicit formula for a 0 (A -1 ) for the Lobatto IIIB methods has been given 
by Dekker & Verwer (1984), and for the Lobatto IIIA methods by Liu, Dekker & 
Spijker (1987). The result is 

Theorem 14.8. We have for 


Lobatto IIIA 

«oM -1 ) = 

U-i, 

if s = 2, 
ifs > 2, 

Lobatto IIIB 

II 

T 

O 

e 

{u-^r 1 

ifs = 2, 
ifs > 2. 


Proof. For the Lobatto IIIA methods we put D — BC~ 2 with the diagonal matrices 
B = diag (6 2 ,... , b 3 ) and C = diag (c 2 ,..., c 3 ) . As in part (a) of the proof of 
Theorem 14.5 we get 

DA- 1 +(DA~ 1 ) T = e s _ 1 eJ_ 1 +2BC~ 3 

which implies the formula for a 0 (A _1 ), because b s — ( 5(5 — 1)) _1 and (1 + 
2b s )>b s /c s _ 1 foi s>2. 

For the Lobatto IIIB methods the choice D — B(I — C ) 2 (with the matrices 
B = diag & s _i), C = diag (cj,..., c^)) leads to 

DA- 1 +(DA- 1 ) T = e 1 ef + 2 B(I-C). 

This proves the second statement. □ 
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Methods with explicit stages (such as Lobatto III A and IIIB) don’t allow esti¬ 
mates of the numerical solution in the presence of arbitrary perturbations. They are 
usually not AN -stable and K(Z) is not bounded (see Theorem 12.12). Neverthe¬ 
less we have the following uniqueness result. 

Theorem 14.9. Consider a differential equation satisfying (14.2). If the Runge- 
Kutta matrix is of the form (14.23) with invertible A and if hv < a 0 (A -1 ), then 
the nonlinear system (14.1a) has at most one solution. 

Proof. Suppose, there exists a second solution g i satisfying (14.11a) with S i = 0. 

a) If the first stage is explicit we have g x = g 1 . The difference of the two 
Runge-Kutta formulas then yields 

A g = h(A ® I)Af 

with Ag= {gi-gdUi M = {f{x 0 -\-c i h,g i )- f{x Q +c i h,g i ))\ =2 . As in 
the proof of Theorem 14.3 we then conclude that Ag = 0. 

b) In the second case we can apply Theorem 14.3 to the first s — 1 stages, 

which yields uniqueness of g x ,... ,g s _ x . Clearly, g 3 also is unique, because the 
last stage is explicit. □ 


Lobatto IIIC Methods 


For the Lobatto IIIC methods with s > 3 we have a 0 (A -1 ) = 0 (see Theorem 14.5). 
Since these methods are algebraically stable it is natural to ask whether the non¬ 
linear system (14.1a) also has a solution for differential equations satisfying (14.2) 
with v — 0. A positive answer to this question has been given by Hundsdorfer 
& Spijker (1987) for the case s = 3, and by Liu & Kraaijevanger (1988) for the 
general case s>3 (see Exercise 6 below; see also Kraaijevanger & Schneid 1991). 


Exercises 

1. Prove that a 0 (A) > 0 for algebraically stable Runge-Kutta methods. Also, 
a 0 (A _1 ) >0 if in addition the matrix A is invertible. 

2. Let A be a real matrix. Show that a 0 (A) < Re A, where A is an eigenvalue of 

A. 

3. (Hundsdorfer 1985, Cooper 1986). Prove that Theorem 14.2 remains valid for 
singular A, if hv < a with a satisfying 


(u,Au) d > a(Au,Au) D 


for all u G IL S . 
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Hint. Use the transformation g = 11® y 0 + (A ® 1)k and apply the ideas of the 
proof of Theorem 14.2 to the homotopy 

s 

k i = f ( X 0 + C i h , Vo + h L a ij k j ) + ( r “ ^/(^o + C A y 0 )- 
i=i 

4. (Barker, Berman & Plemmons 1978, Montijano 1983). Prove that for any two- 
stage method the condition 

a u > 0, a 22 > 0, det(A) > 0 (14.24) 

is equivalent to a 0 (A -1 ) > 0 . 

Remark . For a generalization of this result to three-stage methods see Kraaije- 
vanger (1991). 

5. For the two-stage Radau IIA method we have a 0 (A -1 ) = 3/2. Construct a 
differential equation y' = A (x)y with ReA(x) = 3/2 + e (e > 0 arbitrarily 
small) such that the Runge-Kutta equations do not admit a unique solution for 
all fc> 0 . 

6 . Prove that for the Lobatto IIIC methods (with s > 3) the matrix 

/ — (A ® I) J with J = blockdiag ( J 1 ,..., J s ) 

is non-singular, if g 2 {^k) — 0- This implies that the Runge-Kutta equations 
(14.1a) have a unique solution for all problems y f = A(x)y + f(x) with 
H 2 {A(x)) < 0. 

Hint (Liu & Kraaijevanger 1988, Liu, Dekker & Spijker 1987). Let v — 
(u 1? ..., v s ) T be a solution of (I — (A®I)J)v = 0. With the help of (14.21) 
show first that v± — v s = 0. Then consider the (s — 2) -dimensional submatrix 
A — (%-)iJ = 2 an< ^ P rove a o(^ _1 ) > 0 by considering the diagonal matrix 
5 = diag( 6 i (c " 1 -l) 2 )-= 2 - 

7. Consider an algebraically stable Runge-Kutta method with invertible A and 
apply it to the differential equation y' = (J(x) — el)y-j-f(x) where g(J(x))< 
0 and e > 0. Prove that the numerical solution y^e) converges to a limit for 
s —y 0 , whereas the internal stages #-(e) need not converge. 

Hint. Expand the < 7 -(e) in a series < 7 -(e) = e -1 + eg + ... and 

prove the implication 

g = (A®I)Jg => (b T (3 I)Jg = 0 

where J = blockdiag (J(x Q + cq/i),..., J(x 0 + c s h)). 



IV.15 B-Convergence 


In using A -stable one-step methods to solve large systems of stiff 
nonlinear differential equations, we have found that 

— (a) some A -stable methods give highly unstable solutions, and 

— (b) the accuracy of the solutions obtained when the equations 
are stiff often appears to be unrelated to the order of the method 
used. 

This has caused us to re-examine the form of stability required 
when stiff systems of equations are solved, and to question the 
relevance of the concept of (nonstiff) order of accuracy for stiff 
problems. (A. Prothero & A. Robinson 1974) 

Prothero & Robinson (1974) were the first to discover the order reduction of im¬ 
plicit Runge-Kutta methods when applied to stiff differential equations. Frank, 
Schneid & Ueberhuber (1981) then introduced the “concept of B -convergence”, 
which furnishes global error estimates independent of the stiffness. 


The Order Reduction Phenomenon 


For the study of the accuracy of Runge-Kutta methods applied to stiff differential 
equations, Prothero & Robinson (1974) proposed considering the problem 


V = A(y - y>(x)) + y{x o) = y>(x 0 ), Re A < 0. 


(15.1) 


This allows explicit formulas for the local and global errors and provides much new 
insight. 

Applying a Runge-Kutta method to (15.1) yields 


3 

9i = y 0 +h^2 a ij (+ c j h )) + ¥>'(*0 + c j h i) 


3 = 1 


s 

Vi = Vo + h ^2 b j{ X (9j ~v( x o + c j h )) + v'( x o + c j h ))- 


(15.2) 


3 = 1 


If we replace here the g^y 0 and y 1 by the exact solution values <p(x 0 + c-/z), 
<p(x Q ) and <p(x Q + h), respectively, we obtain a defect which is given by 


ifi(x 0 + Cjh) = <p(x 0 ) + h^ (*0 + c j h ) + A i,h( x o) 

3 = 1 


(15.3) 


<fi(x o +h) = <p( x o) + h J2 b A( x o + c j h ) + A o,h( x o}' 

3=1 

Taylor series expansion of the functions in (15.3) shows that 

A 0lk (*o) = W 1 ). A ith (x 0 ) = O(h< +1 ), (15.4) 
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where p is the order of the quadrature formula (6-, c-) and q is the largest number 
such that the condition C(q) (see Sect. IV.5), i.e., 

S k 

X/ a *i c i _1 = ^ for k = l,...,q and all i , (15.5) 

i=i 

holds. The minimum of q and p is often called the stage order of the Runge-Kutta 
method. Subtracting (15.3) from (15.2) and eliminating the internal stages we get 

Vi-v( x o+ h ) = R ( z )(yo-v( x o))- zbT ( I - zA )~ lA h( x o)-- A o,h( x o) ( 15 - 6 ) 

where we have used the notation z = \h, R(z) = 1 + zb T (I — zA)~ x 11 for the 
stability function and A h (x) = (A 1 h (x), ..., A s h (x)) T . We also denote the local 
error , which we get from (15.6) on putting y 0 = <p(x 0 ) , by 

M*) = -zb T {I - zA)- 1 A h {x) - A o h (x). (15.7) 

If we repeat the above calculation with x n instead of x Q we obtain the recursion 

Vn+l -<P{ x n+i) = R ( z )(l/n -V{. X n))+ 5 h{ X n) ( 15 - 8 ) 

which leads to the following formula for the global error 

n 

Vn+l - V^n+l) = R{z) n+1 (y 0 - <f{x 0 )) + ^2 R ( Z ) n ~ 3S h( X j)- ( 15 - 9 ) 

J=0 

The classical (non-stiff) theory treats the case where z = 0(h) and in this situation 
the global error behaves like 0(h p ). When solving stiff differential equations 
one is interested in step sizes h which are much larger than |A| -1 . We therefore 
study the global error (15.9) under the assumption that simultaneously h -» 0 and 
z = Xh —y oo. In Table 15.1 we collect the results for the Runge-Kutta methods 
of Sect. IV.5. There in the last column (variable h) the symbols h and z have to 
be interpreted as max h i and z = A min h i . We remark that Formulas (15.7) and 
(15.8) (but not (15.9)) remain valid for variable h , if z is replaced by z n = h n A. 

Table 15.1. Error for (15.1) when h -> 0 and z = h\ -> oo 


Method 

local error 

global error 



constant h 

variable h 

^ is odd 

Gauss < 

h° +i 

{ h : +1 

h s 

l 5 even 


\h s 


Radau IA 

h s 

h s 

h s 

Radau IIA 

z- 1 ^ 1 

z- 1 ^ 1 

z~ l h° +l 

Lobatto IIIA ( s odd 

15 even 

z~'h s+l 

r z~ l h s 

l Z~ l h s+l 

z~~ x h s 

Lobatto IHB ( s odd 

zh s - 1 

fzh*- 2 

zh s ~ 2 

L 5 even 


X zh s 1 


Lobatto IIIC 

z~ l h s 

z~ l h s 

z~ l h s 
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Verification of Table 15.1. 

Gauss. Since the Runge-Kutta matrix A is invertible, we have — zb T (I — zA)~ 1 = 
b T A~ l J rO{z~ l ) and (15.4) inserted into (15.7) gives S h (x) = (9(fo s+1 ) (observe 
that q = 3). It then follows from (15.8) (for constant and variable h ) that the global 
error behaves like 0(h s ) because \R(z)\ < 1. For odd 3 we have R( oo) = —1 
and the global error estimate can be improved in the case of constant step sizes. 
This follows from partial summation 

JL 1 — /i n + 1 Al_ 1 — 

J2g n ~ 3 5(xj) = ------5(x 0 ) + ^- t ——( 15 - 10 ) 

j=o 9 j=i 6 

of the sum in (15.9) and from the fact that 8 h {x-) — S h (x J __ 1 ) = 0(h ?+ 2 ). 

Radau IA. The local error estimate follows in the same way as for the Gauss meth¬ 
ods. Since R(z) — 0(z~ 1 ) the error propagation in (15.8) is negligible and the 
local and global errors have the same asymptotic behaviour. 

Radau IIA and Lobatto IIIC. These methods have a si = 6 - for all i . Therefore 
the last internal stage is identical to the numerical solution and the local error can 
be written as 

S hi x ) = -eji 1 ~ zA r lA h( x )- 

Since A is invertible this formula shows the presence of z~ 1 in the local error. 
Again we have R( oo) = 0, so that the global error is essentially equal to the local 
error. 

Lobatto IIIA. The first stage is explicit, g 1 =y Qi and is done without introducing 
an error. Therefore A x h (x) =0 and (because of a si = b i ) the local error has the 
form _ _ 

5 h(x) = - zA)- 1 A h {x) 

where A = (a-) s lJ=2 and A h — (A 2h ,..., A s j t ) T . The statements of Table 15.1 
now follow as for the Gauss methods. 

Lobatto IIIB. The matrix A is singular (its last column vanishes), therefore the 
two “ 2 ” in (15.7) do not simply cancel for z 00 . A more detailed analysis (see 
Exercise 5 below) shows that the local error is not bounded if z 00 . Although 
A-stable, these methods are not suited for the solution of stiff problems. □ 


We observe from Table 15.1 that the order of convergence for problem (15.1) 
with large A is considerably smaller than the classical order. Further we see that 
methods satisfying a si = b i (Radau IIA, Lobatto IIIA and Lobatto IIIC) give an 
asymptotically exact result for z 00 . Prothero & Robinson (1974) call such 
methods stiffly accurate. The importance of this condition will appear again when 
we treat singularly perturbed and differential-algebraic problems (Chapter VI). 
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The Local Error 


Das besondere Schmerzenskind sind die Fehlerabschatzungen. 

(L. Collatz 1950) 

Our next aim is to extend the above results to general nonlinear differential equa¬ 
tions y f = /(x, y) satisfying a one-sided Lipschitz condition 

(f(x,y)~ f{x,z),y -z) <v\\y-z\\ 2 . (15.11) 

The following analysis, begun by Frank, Schneid & Ueberhuber (1981), was elab¬ 
orated by Frank, Schneid & Ueberhuber (1985) and Dekker & Verwer (1984). We 
again denote the local error by 

s h( x ) = V\ -y(x + h), 

where y 1 is the numerical solution with initial value y 0 = y(x) on the exact solu¬ 
tion. 

Proposition 15.1. Consider a differential equation which satisfies (15.11). Assume 
that the Runge-Kutta matrix A is invertible , a 0 (A _1 ) > 0 (see Definition 14.1), 
and that the stage order is q. 

a) If a 0 (A~ 1 ) > 0 then 

< C h q+1 max ||j/ (,+1) (£)|| for hv < a < a Q (A~ l ). 

£€[x,x+h] 

b) If a d (A~ 1 ) = 0 for some positive diagonal matrix D and v < 0 then 

+ M max ||y(« +1 )(OI|. 

V | V\/ /ij 

In both cases the constant C depends only on the coefficients of the Runge-Kutta 
matrix and on a (for case (a)). 

Remarks, a) The crucial fact in these estimates is that the right-hand side depends 
only on derivatives of the exact solution and not on the stiffness of the problem. 
These estimates are useful when a “smooth” solution of a stiff problem has to be 
approximated. 

b) The hypothesis a D (A _1 ) = 0 (see case (b)) is stronger than a 0 (A _1 ) = 0 
(see Exercise 4 below). For the Lobatto IIIC methods, for which a 0 ( A~ l ) = 0 (s > 
2), we have a D (A~ 1 ) = 0 with D = B (see (14.21)). For stiffly accurate methods 
the estimate of part (b) can be improved by using (14.12) instead of (14.13). 

c) In the estimates of the above proposition the maximum is taken over £ E 
[x,x + h\. In the case where 0 < c • < 1 is not satisfied, this interval must of course 
be correspondingly enlarged. 
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Proof. We put = y(x Q + c-/i), so that the relation (14.1 la) is satisfied with 


s i = y( x o + c i h ) - y( x o) - h Y2 a ijy'( x o+ c j h )- 

3 = 1 

Taylor expansion shows that 

Ill'll < C { h q+1 max ||y<* + 1 >(a:)|| 

x£[xo,xi\ 

where C { — (|c ! |«+ 1 + (q + 1 ) X)j=i \ a ij \ ' \ c j\ q )/( ( l+ !)■ is a method-dependent 
constant. Similarly, the value /7i of (14.11b) satisfies 

5 

y ( x o + h)-y 1 = y{x 0 +h)~ y(x 0 ) -h^bjy'ix^ + Cjh) = £>(h ?+1 ), (15.12) 

i=i 

because the order of the quadrature formula ( 6 -, c-) is > q. Since 

KM II < 11% -%II+ 11% -yi x o + h )\\ 


the desired estimates follow from (14.13) of Theorem 14.3. 


□ 


Error Propagation 

At the end of Sect. IV. 12 we derived for some particular Runge-Kutta methods 
sharp estimates of the form 

11% — %II <V B ( hv ) 11%-J/oll> (15.13) 

where y x , y 1 are the numerical solutions corresponding to y 0 , y Q , respectively, and 
where the differential equation satisfies (15.11). We give here a simple proof of a 
crude estimate of c p B (hu) which, however, will be sufficient to derive interesting 
convergence results. 

Proposition 15.2 (Dekker & Verwer 1984). Suppose that the differential equation 
satisfies (15.11) and apply an algebraically stable Runge-Kutta method with invert¬ 
ible A and a 0 (A _1 ) > 0. Then for any a with 0 < a < a 0 (A -1 ) there exists a 
constant C > 0 such that 

||% -%|| < {l + Chi/)\\y 0 -y 0 \\ for 0<hv<a. 

Proof. From (12.7) we have (using the notation of the proof of Theorem 12.4) 

S S3 

II A% || 2 = ||Ay 0 || 2 + 2 £ b % (Af t ,Ag t ) - £ £ A /,>‘ ( 15 - 14 ) 

i= 1 i=l J = 1 
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By algebraic stability the last term in (15.14) is non-positive and can be neglected. 
Using (15.11) and the estimate (14.12) with S • = y 0 — y 0 we obtain 

3 8 

<^Yh b i n^iii 2 

Z=1 Z— 1 

Inserting this into (15.14) yields 

IIAt/! || < ( 1+ (q, 0 (A-1) 2 — ^^) 2 ) ll A2/ °ll 

which proves the desired estimate. □ 


B -Convergence for Variable Step Sizes 

We are now in a position to present the main result of this section. 

Theorem 15.3. Consider an algebraically stable Runge-Kutta method with invert¬ 
ible A and stage order q < p and suppose that (15.11) holds. 

a) If 0 < a < a 0 (A -1 ) and v > 0 then the global error satisfies 

(pCiv(x n -Xo) _ 1 \ 

II y n -y{ x n)\\< h1 - T, - ~ C 2 max ||j/ (?+ 1 ) (x)|| for hv<ot. 

3? £ ro J 

b) If a 0 (A _1 ) > 0 and v < 0 then 

\\y n -y( x n)\\< h9 { x n- x o)C 2 max || 2 / (?+ 1 ) (z)|| forall h> 0 . 

xE[xo,x n \ 

c) If a D (A~ l ) = 0 for some positive diagonal matrix D and v < 0 then 

\\y n -y{ x n)\\ <h q ~ 1 c(h + 2 -)(x„-a;o) max ||j/ ( « + 1 ) (a;)||. 

\ \v\/ xe[x 0 ,x n \ 

The constants C 1 ,C 2 -,C depend only on the coefficients of the Runge-Kutta matrix. 
In the case of variable step sizes, h has to be interpreted as h — max h •. 

Proof. This convergence result is obtained in exactly the same way as that for non¬ 
stiff problems (Theorem II.3.6). For the transported errors E- (see Fig.II.3.2) we 
have the estimate (for v > 0 ) 

||B jl |<e c "(*"-^)||^._ l(a:i _ l) || (15.15) 

by Proposition 15.2, because l + Chv < e Cvh . We next insert the local error esti¬ 
mate of Proposition 15.1 into (15.15) and sum up the transported errors E-. This 
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yields the desired estimate for v > 0 because 


n 

'y' j h i _ 1 e Cv( ' Xn ~ x ^ 

j =i 



X ^dx 


( e Cv( Xn - Xo ) _ i)/(Cv) for ^ > 0 
x n — x 0 for v — 0 . 


If v < 0 we have \\Ej\\ < ||^._ 1 (^j_i)|| by algebraic stability and the same argu¬ 
ments apply. □ 


Motivated by this result we define the order of B -convergence as follows: 

Definition 15.4 (Frank, Schneid & Ueberhuber 1981). A Runge-Kutta method is 
called B -convergent of order r for problems y f = f(x, y) satisfying (15.11), if the 
global error admits an estimate 

\\y n -y( x n)\\ <h r l{x n -x Q ,v) max max ||y 0 ) (a:)|| for hv<a , 

(15.16) 

where h — max h i . Here 7 is a method-dependent function and a also depends 
only on the coefficients of the method. 

As an application of the above theorem we have 

Theorem 15.5. The Gauss and Radau IIA methods are B -convergent of order s 
(number of stages). The Radau I A methods are B -convergent of order s — 1. The 
2-stage Lobatto IIIC method is B -convergent of order 1. □ 


For the Lobatto IIIC methods with 5 > 3 stages (a 0 ( A -1 ) = 0 and q = s — 1 ) 
Theorem 15.3 shows B -convergence of order 5 — 2 if v < 0. This is not an optimal 
result. Spijker (1986) proved B -convergence of order 5 — 3/2fori/<0 and con¬ 
stant step sizes. Schneid (1987) improved this result to 5 — 1. Recently, Dekker, 
Kraaijevanger & Schneid (1991) showed that these methods are i?-convergent of 
order 5 — 1 for general step size sequences, if one allows the function 7 in Defini¬ 
tion 15.4 to depend also on the ratio max hj min h i . 

The Lobatto IIIA and IIIB methods cannot be B -convergent since they are not 
algebraically stable. This will be the content of the next subsection. 
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B -Convergence Implies Algebraic Stability 

In order to find necessary conditions for B -convergence we consider the problem 

y' = \(x)(y - tp(x)) +<p'(x), ReX(x) < v (15.17) 

with exact solution <p(x) — x^ 1 . We apply a Runge-Kutta method with stage order 
q and obtain for the global error e n = y n — <p(x n ) the simple recursion 

e„+i = K(Zn)e n ~L(Z n )h* +1 (15.18) 

(cf. Eq. (15.8) of the beginning of this section, where the case A(z) = A was 
treated). Here Z n = diag ( hX(x n + Cj/i),..., hX(x n + c s h)) and 

K(Z) = 1 + b T Z(I - AZ)- 1 1 , L(Z) = d 0 + b T Z{I - AZ)~ 1 d. (15.19) 

The function K(Z) was already encountered in Definition 12.10, when treating 
AN -stability. The vector d = (d x ,..., d s ) T and d 0 in L(Z) characterize the local 
error and are given by 

s s 

<4 = l-(<7+l)£V^ d i =4 +1 - (?+!)£ a ij c )- (15.20) 
i=i j =i 

Observe that by definition of the stage order we have either d 0 ^ 0 or d ^ 0 (or 
both). We are now in the position to prove 

Theorem 15.6 (Dekker, Kraaijevanger & Schneid 1991). Consider a DJ-irredu¬ 
cible Runge-Kutta method which satisfies 0 < c 1 < c 2 < ... < c s < 1. If for some 
r, l and v < 0, the global error satisfies the B -convergence estimate (15.16), then 
the method is algebraically stable. 

Proof. Suppose that the method is not algebraically stable. Then, by Theorem 12.13 
and Lemma 15.17 below, there exists Z — diag (z x ,..., z s ) with Re z- < 0 such 
that (I — AZ) -1 exists and 

\K(Z)\>1, L{Z)? 0. (15.21) 

We consider the interval [0, (1 + 0)/2] and for even N the step size sequence 
{h n )*=o given by 

h n = 1/N (for n even), h n = 0/N (for n odd). 

If N is sufficiently large it is possible to define a function X(x) which satisfies 
Re A(x) < v and 

( Nz { for n even 

+ =AJ fornodd . 

Because of (15.18) the global error e n =y n - ip(x n ) for the problem (15.17) sat- 
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isfies (with h = l/N) 

s n+1 = K(Z)e n - h q+1 L(Z) for n even 
e n+1 =K{Z)e n -h q+1 L{Z) fornodd 


where Z — diag (0z s ,... ,8z 1 ). Consequently we have 

e 2m+2 = K(Z)K(Z)e 2m - h q+1 (K(Z)L(Z) + 8 q+1 L(Z)) 
and the error at X = (1 + 8) /2 is given by 


£ n ~ 


^ (K(Z)L(Z) + 8 q ^L(Z)) 


{K{Z)K(Z)) N / 2 - 1 
K(Z)K(Z)- 1 


(15.22) 


If 0 is sufficiently small, K(Z) -> 1 and L(Z) -> d 0 , so that by (15.21) 

\K{Z)K(Z)\ > 1 and K{Z)L(Z) + e q ^L(Z) ± 0. 

Therefore | —^ cxd as TV -» oo (TV even), which contradicts the estimate (15.16) 
of B -convergence. □ 


To complete the above proof we give the following lemma: 

Lemma 15.7 (Dekker, Kraaijevanger & Schneid 1990). Consider a DJ-irredu¬ 
cible Runge-Kutta method and suppose 

b T Z(I — AZ)~ l d = 0 (15.23) 

for all Z = diag (z 1 ,..., z s ) with I — AZ invertible; then d — 0. 

Proof We define 

T = {il h i 1 a i 1 i 2 a i 2 i 3 ■ ■ ■ a lk ^i k = 0 for all k ^ h with *'* = i}- 

Putting k= 1 we obtain bj = 0 for j e T . Further, if i g T and j G T there exists 
(z x ,..., i k ) with i k — i such that 

K a hh ■ ■ ■ a i k -,i k + o. K a ii » 2 • • • a i k -.i k % = 0 

implying a - — 0. Therefore the method is DJ -reducible if T / 0. For the proof 
of the statement it thus suffices to show that d / 0 implies T / 0. 

Replacing (I — AZ ) -1 by its geometric series, assumption (15.23) becomes 
equivalent to 

b T Z(AZ) k ~ 1 d = 0 for all k and Z = diag (z 1 ,..., z 3 ). (15.24) 

Comparing the coefficient of z i± • • • z ik gives 

''' a jk-ijk djk ~ 


(15.25) 
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where the summation is over all permutations (j 1 ,..., j k ) of (z 1? ...,z fc ). Sup¬ 
pose now that d ■ ^ 0 for some index j . We shall prove by induction on k that 

=° fora11 ■■■,&) with i k =j, (15.26) 

so that j £ T and consequently T ^ 0. 

For k = 1 this follows immediately from (15.25). In order to prove (15.26) 
for k + 1 we suppose, by contradiction, that (z 1? ..., z fc+1 ) with = j exists 
such that b ii a iii 2 ... a ifcifc+i ^0. The relation (15.25) then implies the existence of 
a permutation (jj,..., j k+1 ) of (t 1? ... ,i k+1 ) such that b h a jih ... a jkjk+i ^ 0 , 
too. We now denote by q the smallest index for which i q =£ j q . Then i q = j r for 
some r > q and 


K a ni’2 


iq—liq jrjr -\-1 


' a jkjk +1 ^ 0 


(15.27) 


contradicts the induction hypothesis, because the expression in (15.27) contains at 
most k factors. □ 


The Trapezoidal Rule 


The trapezoidal rule 

yjt+i = yjt + y(/( a; jfe) y»;) +/(^fe+nyfc+i)) (15.28) 

is not algebraically stable. Therefore (Theorem 15.6) it cannot be B -convergent in 
the sense of Definition 15.4. Nevertheless it is possible to derive estimates (15.16), 
if we restrict ourselves to special step size sequences (constant, monotonic, ...). 
This was first proved by Stetter (unpublished) and investigated in detail by Kraai- 
jevanger (1985). The result is 


Theorem 15.8 (Kraaijevanger 1985). If the differential equation satisfies (15.11), 
then the global error of the trapezoidal rule permits for h-v < a < 2 the estimate 


II y n -y( x n)\\ < c ™ ax . 

X t 0 ) n J 


i,( 3 ) 


n — 1 n — 1 

mh 52 {n max (^ h j/ h j-i)) h k- 

k =0 j=k +1 


Proof We denote by y k = y{x k ) the exact solution at the grid points. From the 
Taylor expansion we then get 


where 


24+1 =£it + y + /(**+! iVk+l)) + 5 k 


¥k\\<Tj h k r max . Il2/ (3) ( a: )ll- 


(15.29) 


(15.30) 
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The main idea is now to introduce the intermediate values 

24+1/2 = Vk + ^fi x k > Vk) = 24+1 - ^rf(*k+ 1 > 24+1) 

h h (15-31) 

24+1/2 = % + + = 24+1 - y/^fc+i^fc+i)' 

The transition y fc _ 1/2 -+ ;y fc+J/2 

24+1/2 = 24—1/2 + ^(fyfc-i + /l fc)/( a 4>24) 

can then be interpreted as one step of the 0 -method 

2/m+i = y m + h f( x m + eh > y m +% m +i - y m )) 

with 0 = h k _ 1 /(h k _ 1 +h k ) and step size h = (h k _ 1 +h k )/2. A similar calcu¬ 
lation shows that the same 0-method maps y k _i/ 2 to y k + 1/2 — • Therefore we 

have 

1124+1 /2 24+1/2 ^k II <¥>b(M l|y*-l /2 — ^fc- 1/2 IU 

where the growth function y> B (hv) is given by (see Eqs. (12.42) and (11.13)) 
Lp B {h v ) — max{(l — 0)16, (1 + (1 — 6)hu)/( 1 — 6hu)} 

= ma x{h k /h k _ 1 , (1 + ^h k v)/{ 1 - ^h^u)} =: y k . 

By the triangle inequality we also get 

1124+1/2 — S/fc+1/2 II - ^Jfclbfc-1/2 ~ 24-1/2 II + ll^Jfell- 

Further it follows from (15.31) with k = 0 and from y 0 = y 0 that 

ii^i/2—2/i/ 2 ii = n<y> 

whereas the backward Euler steps y n _ 1 f 2 -> y n and y n _ 1 / 2 -> y n (see (15.31)) 
imply 

II24 — 2/n II — /-I _ 1_7 7T II 2/n —1/2 — 2/n — l/2 II (15.35) 

l 1 2^n-l Z/ J 

again by Example 12.24 with 6 = 1. A combination of (15.33), (15.34) and (15.35) 
yields 

ll^n-ynll < JY^Th - v) E ( II Vj)W 5 kW- ( 15 - 36 ^ 

\ 2 n 1 ) k —Q j=k- 1-1 


(15.32) 

(15.33) 

(15.34) 


For v < 0 we have ip k < max(l, h k /h k _ 1 ) and the statement follows if we insert 
(15.30) into (15.36). For v > 0 we use the estimate ( h k _ x v < 1) 


l +\ h k v l +\ h k~i v 

1 -\K-iV l-\h k -i v 


1 + \^k V 

1 +2 h k-l 1 ' 


< e 2hk ~ 1I/ • max^l 


K_ 

^k-i 


so that the statement holds with C = e 2v ^ Xn ~ XQ ) /12. 


□ 
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Corollary 15.9. If the step size sequence (ft*.)^ 1 is constant or monotonic, then 
for ft = max 


\\yn-y{ x n)\\^ C max 


\\y (3 \x)\\-h 2 . 


□ 


Order Reduction for Rosenbrock Methods 


Obviously, Rosenbrock methods (Definition 7.1) cannot be B -convergent in the 
sense of Definition 15.4 (see also Exercise 7 of Sect. IV. 12). Nevertheless it is in¬ 
teresting to study their behaviour on stiff problems such as the Prothero & Robin¬ 
son model (15.1). Since this equation is non-autonomous we have to use the for¬ 
mulation (7.4a). A straightforward calculation shows that the global error e n = 
y n —<p(x n ) satisfies the recursion 

e B+ i=-R(^„+M*n) (15.37) 

where R(z) is the stability function (7.14) and the local error is given by 

S h ( x ) = <p(x) — (p(x + ft) + b T (I — zB)~ l A (15.38) 

with B = (a ij +'y ij ),b={b 1 ,...,b s ) T , A = (Aj,..., AJ T and 

A* = z(tp{x) - tp(x + a { h) - 7 ihtp'fo)) + h(p'(x + a { h) + 7 -/i 2 </?"(a;). 

Taylor expansion gives the following result. 

Lemma 15.10. The local error S h (x) of a Rosenbrock method applied to (15.1) 
satisfies for ft —> 0 and z = Aft —» 00 

S h( x ) = (J2 b i uj >} a2 j - x )\v"i x ) + 0{h 3 ) + o(ffj , 

where to i - are the entries of B ~ l . □ 


Remarks, a) Unless the Rosenbrock method satisfies the new order condition 

s 

Y, = 1, (1539) 

*,j=l 

the local error and the global error (if |J?(oo)| < 1) are of size 0(h 2 ). Since none 
of the classical Rosenbrock methods of Sect. IV.7 satisfies (15.39), their order of 
convergence is only 2 for the problem (15.1) if A is very large, 
b) A convenient way to satisfy (15.39) is to require 

01 si + 7 si = bi(i = l,...,s) and a s = 1. 


(15.40) 
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This is the analogue of the condition a si = b ■ for Runge-Kutta methods. It implies 
not only (15.39) but even 

6 h (x) = 0(*), 

so that such methods yield asymptotically exact results for 2 — >■ 00. 

c) A deeper understanding of Eq. (15.39) will be possible when studying the 
error of Rosenbrock methods for singular perturbation and differential-algebraic 
problems (Chapter VI). We shall construct there methods satisfying (15.40). 

d) Scholz (1989) writes the local error S k (x) in the form 

S h( x )='52 C j( z ) h: ’ l P < ' } \ x ) (15.41) 

J>2 

and investigates the possibility of having Cj(z) = 0 for j = 2 (and also j >2). 
Hundsdorfer (1986) and Strehmel & Weiner (1987) extend the above analysis to 
semi-linear problems (11.21) which satisfy (11.22). Their results are rather techni¬ 
cal but allow the construction of “ B -convergent” methods of order p > 1 . 


Exercises 

1. Prove that the stage order of an SDIRK method is at most 1, that of a DIRK 
method at most 2. 

2. Consider a Runge-Kutta method with 0 < c x < ... < c s < 1 which has stage 
order q. Prove that the method cannot be B -convergent (for variable step 
sizes) of order q+l. 

Hint. Use Formula (15.22) and prove that 

K(Z)L(Z) + 8^L(Z) 2) 

K(Z)K(Z)~ 1 

cannot be uniformly bounded for 

Z = diag (zj,..., z J, Z = diag (zj,..., z s ) 

with Re z i < 0, Re z t < 0 (in the case c x — 0 and c s = 1 one has to prove 
this under the restriction z ± — 0z s , z s = 0z 1 ). For this consider values z -, z- 
close to the origin. 

3. (Burrage & Hundsdorfer 1987). Assume c • — c J is not an integer for 1 < i < 
j < s, and the order of B -convergence (for constant step sizes) of a Runge- 
Kutta method is q + 1 (q denotes the stage order). Then d Q = 0 and all com¬ 
ponents of d = (d 1? ..., d s ) T are equal (see (15.20) for the definition of d J ). 

Hint. Study the uniform boundedness of the function L(Z)/(K (Z) — 1) . 
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4. (Kraaijevanger). Show that for 

/ ° 1 °\ 

A -1 = -1 0 0 (15.43) 

V i ii/ 

we have a 0 (A _l ) = 0, but there exists no positive diagonal matrix D such 
that a D (A _1 ) = 0. For more insight see “Corollary 2.15” of Kraaijevanger & 
Schneid (1991). 


5. Prove that for the Lobatto IIIB methods, with 


A = 


A 0 
a T 0 


the dominant term of the local error (15.7) is (for h -* 0 and z — hX -»• oo) 
zb^A- 1 ^ 1 - 1 ) 2 ^ ¥> (9+1) (*). 

Here q = s — 2 is the stage order and c = (c x ,..., c s _ 1 ) T . Show further that 

a T A~ 1 c k = 1 for fc = l,2,...,g (15.44) 

a T A~ 1 c k 1 for k = q+ 1. (15.45) 

Hint. Equation (15.44) follows from C(q). Show (15.45) by supposing that 
a T A~ 1 c ( i^ 1 = 1 which together with (15.44) implies that 

5 — 1 

d-p(c-) = p(l) where d T — a T A _1 

Z=1 

for every polynomial of degp < q + 1 = 5 — 1 satisfying p(0) = 0. Arrive at a 
contradiction with 


p( x ) = (x- C 1 )(a: - C 2 ) •... • (x - c^). 



Chapter V. Multistep Methods for Stiff Problems 


Multistep methods (BDF) were the first numerical methods to be proposed for stiff 
differential equations (Curtiss & Hirschfelder 1952) and since Gear’s book (1971) 
computer codes based on these methods have been the most prominent and most 
widely used for all stiff computations. 

This chapter introduces the linear stability theory for multistep methods in 
Sect. V.l, and arrives at the famous theorem of Dahlquist which says that A -stable 
multistep methods cannot have high order. Attempts to circumvent this barrier pro¬ 
ceed mainly in two directions: either study methods with slightly weaker stability 
requirements (Sect. V.2) or introduce new classes of methods (Sect. V.3). Order star 
theory on Riemann surfaces (Sect. V.4) then helps to extend Dahlquist’s barrier to 
generalized methods and to explain various properties of stability domains. Sec¬ 
tion V.5 presents numerical experiments with several codes based on the methods 
introduced. 

Since all the foregoing stability theory is based uniquely on linear autonomous 
problems y f = Ay , the question arises of their validity for general nonlinear prob¬ 
lems. This leads to the concepts of G -stability for multistep methods (Sect. V.6) 
and algebraic stability for general linear methods (Sect. V.9). 

Another important subject is convergence estimates for h -> 0 which are in¬ 
dependent of the stiffness (the analogue of i?-convergence in Sect. IV. 15). We 
describe various techniques for obtaining such estimates in Sections V.7 (for linear 
problems) as well as V.6 and V.8 (for nonlinear problems). These techniques are: 
use of G -stability, the Kreiss matrix theorem, the multiplier technique and, last but 
not least, a discrete variation of constants formula. 
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A general k -step multistep method is of the form 

a kym+k + a k-iy m +k-i + --- + a oy m = HPkf m +k + ---+Pofm)- 0-i) 

For this method, we can do the same stability analysis as in Sect. IV.2 for Euler’s 
method. This means that we apply it to the linearized and autonomous system 

y' = Jy (1.2) 

(see (IV.2.2’)); this gives 

a kVm+k + • • • + a 0 y m = hJ((i k y m+k + ... + P 0 y m ). (1.3) 

We again introduce a new basis for the vectors y m+ • consisting of the eigenvectors 
of J. Then for the coefficients of y m+ -, with respect to an eigenvector v of J, 
we obtain exactly the same reccurrence equation as (1.3), with J replaced by the 
corresponding eigenvalue A. This gives 1 

( a fc-^fc)j/m+fc+-" + ( a O-^o)j/m= 0 > V = h\ (1.4) 
and is the same as Method (1.1) applied to Dahlquist’s test equation 

y' = \y. ( 1 . 5 ) 


The Stability Region 

The difference equation (1.4) is solved using Lagrange’s method (see Volume I, 
Sect. III.3): we set y ■ = (V, divide by ( m and obtain the characteristic equation 

(a k - nP k )(, k + • • ■ + (a 0 - /J,p 0 ) = e(() - jUo-(C) = 0 (1.6) 

which depends on the complex parameter jjl. The polynomials g(() and <r(() are 
our old friends from (III.2.4). The difference equation (1.4) has stable solutions 
(for arbitrary starting values) iff all roots of (1.6) are < 1 in modulus. In addition, 
multiple roots must be strictly smaller than 1 (see Volume I, Sect. III.3, Exercise 1). 

1 In contrast to Chapter IV, where the product hX was denoted throughout by z, we write 
h\ — y here, since in multistep theory (Sect. III.3) z denotes the Cayley transform of (. 
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Definition 1.1. The set 

all roots (j(/i) of (1.6) satisfy | (/aOI < 1, 
multiple roots satisfy \(j(ti)\ < 1 

is called the stability domain or stability region or region of absolute stability of 
Method (1.1). We have A-stability if S D C~ . 

It is sometimes desirable to consider S as a subset of the compactified complex 
plane C. In this case, for fi -» oo, the roots of Eq. (1.6) tend to those of a(() = 0. 

For (i — 0, Eq. (1.6) becomes g(() = 0. Thus the usual stability (in the sense 
of Definition III.3.2) is equivalent to 0 £ S. 

Theorem 1.2. All numerical solutions of Method (1.1) are bounded for the linea¬ 
rized equation (1.2) with a diagonalizable matrix J iff hX £ S for all eigenvalues 
A of J. □ 




We explain the computation of the stability domain at a particular example, the 
explicit Adams method of order 4 (see Sect. III. 1, Eq. (1.5)), 

, ,/55, _59 37 _ 9_ \ 

^m +4 y m-V'i ' ^ m-V'i 24 * m -\-2 ' 24 Jm-Vl 24 / 

for which Eq. (1.6) becomes 



> \, 3 59 , 

^)c +24K 


37 Q 
24 ^ + 24 ^ = °- 


d-8) 


In Fig. 1.1 we display the complicated behavior of the roots of this equation. We 
choose the /i values as the dots surrounding the white horse, and plot the cor¬ 
responding 4 roots Ci, (2 7 C37 (4 i n the C -plane, which can be observed to emerge 
from the roots 1,0,0,0 of the q -polynomial. 

Complex mappings are conformal, i.e., preserve angles and orientation. The 
angle of rotation and the magnification of a complex map is (locally) determined 
by its derivative. Differentiating (1.8) with respect to /i and putting /i = 0, C = 1 
gives 

0'(l)-d(O)-<r(l)=O, 


hence ([(0) — 1 (because of the consistency conditions ^'(1) f=- 0, <j(l) = ^'(1), 
see Volume I, Eq. (III.2.6)). This explains the fact that the map /i i-» Ci is close to 
1 -j- (i in the neighbourhood of (i — 0, and Ci(a 0 moves inside the unit disc when 
fi moves inside C“ . 


The Root Locus Curve. The key for computing S is the fact that the inverse map 
C (i, since (1.8) is linear in (i , can easily be computed and is one-valued 


g(C) c 4 -c 3 
-(C) IC 3 -HC 2 + §K^- 


(1.9) 
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The outside of the unit circle in the (-plane mapped back into the p -plane by this 
formula (see the zodiac of gray horses in Fig. 1.1) produces the forbidden ^-values, 
for which at least one root (fp) generates instability. The image of the boundary 
curve of the unit circle ( = e i6 , 0 < 6 < 27t, is called the root locus curve. It 
must be considered as an oriented curve and the stability region, whenever it is not 
empty, must lie to the left of it. 



Fig. 1.1. Plot of the stability function (1.8) with root locus curve 


We conclude that the stability domain of Adams4 is precisely the small dia¬ 
mond shaped region surrounded by the root locus curve in the positive direction 
located between the origin and the point p = 2- 24/(—55 — 59 — 37 — 9) = —0.3. 

Adams Methods 

The explicit Adams methods (III. 1.5) applied to y' = A y give 

v—^ 15 3 

+ 70 = !, 7i = 2> 7 2 = ^> 7s=g,--- ( uo ) 

j =0 

or, after putting y n = ( n and dividing by ( n , 

C-l=^(7o+7i(l-J)+7 2 (l-| + ^)+ •••)• 
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Hence the root locus curve becomes 

C-1 


/J: 


E.toS/i-W’ 


C — 


(i.io 5 ) 


For k = 1 we again obtain the circle of Euler’s method, centred at —1. These 
curves are plotted in Fig. 1.2 for k = 2,3,..., 6 and show stability domains of 
rapidly decreasing sizes. These methods are thus surely not appropriate for stiff 
problems. 



Fig. 1.2. Stability domains for explicit Adams methods 



Fig. 1.3. Stability domains of implicit 




fc = 5 


2 

- 



-2 


-- 4 - 

L 


Adams methods, compared to those 


The implicit Adams methods (III. 1.8) lead to 

k 1 
Vn+1 = Vn + V, y»+1> 7o = !> 7* = -5, 72 

j =0 

Here we put y n = ( n and divide by ( n+1 . This gives 

_ 1“1/C >_ io 


1_ 

12 ’ 


E*=o7;(i-i/cy 


C = e ! 


( 1 . 11 ) 


( 1 . 11 ’) 


For k= 1 this is the implicit trapezoidal rule and is A-stable. For k = 2,3,..., 6 
the stability domains, though much larger than those of the explicit methods, do 
not cover C~ (see Fig. 1.3). Hence these methods are not A-stable. 
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Predictor-Corrector Schemes 

The inadequacy of the theory incorporating the effect of the cor¬ 
rector equation only for predictor-corrector methods was first dis¬ 
covered through experimental computations on the prototype lin¬ 
ear equation 

y'= /(*,»)=-100y +100, y(0) = 0, 

(...) Very poor correlation of actual errors with the errors ex¬ 
pected on the basis of the properties of the corrector equation 
alone was obtained. This motivated the development of the theory 

(P.E. Chase 1962) 


As we have seen in Sect. III. 1, the classical way of computing y n+1 from the im¬ 
plicit equations (III. 1.8) is to use the result y* +1 of the explicit Adams method as 
a predictor in f3 k f( x n + 1 , y n+1 ). This destroys a good deal of the stability prop¬ 
erties of the method (Chase 1962). The stability analysis changes as follows: the 
predictor formula 

y* n +i = yn+M(7 0 y„ + 7i(y„-s/„—i) + 7 2 (yn-2yn-i +y„— 2 ) + --0 ( U2 ) 


must be inserted into the corrector formula 

Vn+1 = Vn +^( 70^+1 + 

il(y*n+i-y n )+ 

i2{y*n+i- 2 yn + y n -i)+ 

lt(y*n+i -3y n + 3y B _ 1 -y„_ 2 ) + ...) 


(1.13) 


Since there is a ^ in (1.12) and in (1.13), we obtain this time, by putting y n = ( n 
and dividing by ( n , a quadratic equation for fi , 

Afi 2 +Bn + C = 0, (1.14) 


j =0 j=0 S 

j=0 j =0 ^ 

C = 1-C- 


For each ( = e ie , Eq. (1.14) has two roots. These give rise to two root locus curves 
which determine the stability domain. These curves are represented in Fig. 1.4 and 
compared to those of the original implicit methods. It can be seen that we loose 
a lot of stability. In particular, for k = 1 the trapezoidal rule becomes an explicit 
second order Runge Kutta method and the A -stability is destroyed. 

While Chase (1962) studied real eigenvalues only, the general complex case 
has been stated by Crane & Klopfenstein (1965) and, with beautiful figures, by 
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Fig. 1.4. Stability domains for PECE compared to original implicit methods 


Krogh (1966). All three papers also searched for procedures with increased stabil¬ 
ity domains. This research was brought to perfection by Stetter (1968). 

Nystrom Methods 

Thus we see that Milne’s method will not handle so simple an 
equation as y f = — y, y( 0) = 1... (R.W. Hamming 1959) 

... Milne’s method has a number of virtues not possessed by 
its principal rival, the Runge-Kutta method, which are especially 
important when the order of the system of equations is fairly high 
(N=10 to 30 or more) ... (R.W. Hamming 1959) 

The explicit Nystrom method (III. 1.13) for k = 1 and 2 is the “explicit midpoint 
rule” 

y n+1 = y n - 1 +2hf n (1.15) 

and leads to the root locus curve 

piQ g — iQ 

p =-—-= zsin#. (1.15’) 

This curve moves up and down the imaginary axis between ±z and leaves as stabil¬ 
ity domain just the interval (—z, +z) (see Fig. 1.5). All eigenvalues in the interior 
of the negative half plane lead to instabilities. This is caused by the second root —1 
of g(C) which moves out of the unit circle when p goes West. This famous phe¬ 
nomenon is called the “weak instability” of the midpoint rule and was the “entry 
point” of Dahlquist’s stability-career (Dahlquist 1951). The graphs of Fig. III.9.2 
nicely show the (weak) instability of the numerical solution. 

The implicit Milne-Simpson method { III.1.15) for k = 2 and 3 is 

2 /n +1 = Vn-l +^( 3 / 72+1 + 3 fn + 3 /n-l) (1-16) 

and has the root locus curve 

e iB — e ~ i0 sin# 

^ 3 e ^ + |+ 3 e ~^ cos#+ 2’ 


(1.16’) 
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which moves up and down the imaginary axis between ±.iy/ 3. Thus its behaviour 
is similar to the explicit Nystrom method with a slightly larger stability interval. 

The higher order Nystrom and Milne-Simpson methods have root locus curves 
which are oriented the wrong way round (see Fig. 1.5). Their stability domains 
therefore reduce to the smallest possible set (for stable methods): just the origin. 



Fig. 1.5. Root locus curves for Nystrom and Milne methods 
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For k = 1 we have the implicit Euler method with stability domain S = {p ; \p — 
11 > 1}. For k = 2 the root locus curve (see Fig. 1.6) has Re (p) = § — 2 cos 0 + 
\ cos 2 6 which is > 0 for all 0. Therefore the method is A-stable and of order 
2. However, for k = 3,4,5 and 6 , we see that the methods loose more and more 
stability on a part of the imaginary axis. For k > 7, as we know, the formulas are 
unstable anyway, even at the origin. 


The Second Dahlquist Barrier 

I searched for a long time, finally Professor Lax showed me the 
Riesz-Herglotz theorem and I knew that I had my theorem. 

(G. Dahlquist 1979) 


Theorem 1.3. If the multistep method (1.1) is A-stable, then 

Re (^§) >0 for lcl>1 ' (L18) 
For irreducible methods the converse is also true: (1.18) implies A-stability. 


Proof. If the method is A-stable then all roots of (1.6) must satisfy |(| < 1 when¬ 
ever Re p < 0. The logically equivalent statement (Re p > 0 whenever \(\ > 1) 
yields (1.18) since by ( 1 . 6 ) p = £>(C)MO- 

Suppose now that (1.18) holds and that the method is irreducible. Fix a p 0 
with Re/i 0 < 0 and let ( 0 be a root of (1.6). We then have <t(( 0 ) 7 ^ 0 (otherwise 
the method would be reducible). Hence p 0 = ^(Co)/ (J (Co) an d it follows from 
(1.18) that |Col < 1. We still have to show that ( 0 is a simple root if | £ 0 1 = 1. 
By a continuity argument it follows from (1.18) that |( 0 | = 1 and Rep 0 < 0 are 
contradictory. Therefore, it remains to prove that for Re p 0 = 0 a root satisfying 
| (o | = 1 must be simple. In a neighbourhood of such a root we have 


£(0 

ff (C) 


-»o = c i({- Co) + C 2 (C - Co) 2 + • • • 


and (1.18) implies that C 1 7 ^ 0. This, however, is only possible if ( 0 is a simple 
root of ( 1 . 6 ). □ 


In all the above examples we have not yet seen an A-stable multistep formula 
of order p > 3. The following famous theorem explains this observation. 

Theorem 1.4 (Dahlquist 1963). An A-stable multistep method must be of order 
p <2. If the order is 2, then the error constant satisfies 

C<-—. (1.19) 

“ 12 

The trapezoidal rule is the only A-stable method of order 2 with C — — 1/12. 
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Proof. Dahlquist’s first proof of this theorem is difficult. More elementary versions 
emerged in Widlund (1967), in lecture notes of W. Liniger (Univ. of Neuchatel 
1971) and in the book of Grigorieff (1977, vol.2, p. 218). 

We start by recalling some formulas from Volume I: Eq. (ii) of Theorem III.2.4 
and Eq. (III.2.7) yield 

g(e h ) - ha(e h ) = C p+1 h p+1 + ... for h 0. (1.20) 

From the consistency conditions (III.2.6) we have 

£>(e^) = ^(l + ft + ...) = ^>(l) + ^ / (l)/i + . .. = <j(l)ft + ... . 

We divide (1.20) by hg(e h ) and obtain 
1 ct( \ 

T --f-f- = Ch p ~ 1 + ... for h-+ 0 (1.21) 

h q (e' 1 ) 

where C is the error constant (III.2.13). With C, — e h this becomes 

r~7 _ -777 = C{C - l) p_1 + • • • for C->1- (1-22) 

logC e(t) 

In this formula we put p = 2. Whenever the method is of higher order, we have 
C — 0. When the order of the method is one, we have nothing to prove. The 
same formula for the trapezoidal rule for which g T (() = ( — 1, °t( 0 — !(c+ 1 )> 
becomes by series expansion (or by using Table III.2.1) 


1 _ ^(C) 

logC qt(0 


12 


(C-1) + ... 


for ( -> 1. 


The idea is now to subtract the two formulas and obtain 


(1.23) 


d(0 


g (C) _ q-t(C) 

q(0 Qt(C) 



for (1-24) 


From (1.18) we have that 


Re 



>0 


or equivalently 



for |C| > 1. 


(1.25) 


The point here is that for the trapezoidal rule this Re (...) is zero for |£| = 1 since 
this method has precisely C~ as stability domain. Hence from (1.24) we obtain 


lim Re d(() > 0 for |C 0 | = 1. (1.26) 

C~Ko 
ICl >i 


The poles of d(Q are the roots of g(Q , which, by stability, are not allowed outside 
the unit circle. Thus, by the maximum principle, (1.26) remains true everywhere 
outside the unit circle. Choosing then ( = 1 + £ with Ree > 0 and \e\ small, we 
see from (1.24) that either —C — ^ > 0 or d(() = 0. This concludes the proof. □ 
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Exercises 

1. The Milne-Simpson methods for k = 4 and 5 satisfy Re (g(()/cr(()) > 0 for 
|C| = 1. Since their order is higher than 2, this seems to be in contradiction 
with the above proof of Theorem 1.4. Explain. 

2. For the explicit midpoint rule (1.15), do the endpoints ±i of the stability region 
belong to SI Study the (possible) stability of this method applied with h= 1 

to U f = V , v f = —u. 

3. Compute for the explicit and implicit Adams methods the largest A 0 <G R such 

that the real interval [—A 0 ,0] lies in 5. Show that for the k -step explicit 
Adams methods we have A 0 = 2 ju k with u k = ( w i = 1 > u 2 = 2, 

u z = 11/3, u 4 — 20/3, u 5 = 551/45,...). The use of generating functions 
(see Sect. III. 1) allow us to show that 

00 9 1 

E “*'* = (- 1 +137 - TT2i) - 2,) ' 

j=i 

a series with convergence radius 1/2. This explains why these stability do¬ 
mains decrease so rapidly. 

Hint. Just set 9 = tt in the root locus curve. 

4. Prove that the stability region of the k -step, implicit Adams methods is of finite 
size for every k > 2. 

Hint. Show that ( — l) fc a(—1) < 0, so that a has a real negative root, smaller 
than —1. 

5. a) Show that all 2-step methods of order 2 are given by 

^(C) = (C-i)K + i-«) 

<KC) = (C -1) 2 /? + (C -1)<* + (C +1)/2 

(which are irreducible for a ^ 2/3). 

b) The method is stable at 0 iff a > 1 /2. 

c) The method is stable at oo iff 

a > 1/2 and (3 > a/2. 

Apply the Schur-Cohn criterion (Sect. III.3, Exercise 4). 

d) The method is A-stable iff (1.27) holds. 

Hint. 

a (C) _ 1 C+ 1 | /g QA C - 1 
Q{ 0 2 " C — 1 KP 2 r a(+l-a’ 


(1.27) 
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We are not attempting to disprove Dahlquist’s theorems but are 
trying to get round the conditions they impose ... 


(J. Cash 1979) 


Dahlquist’s condition p < 2 for the order of an A-stable linear multistep method is 
a severe restriction for efficient practical calculations of high precision. There are 
only two ways of “breaking” this barrier: 

• either weaken the condition; 

• or strengthen the method. 

These two points will occupy our attention in this and in the following section. 


A(a )-Stability and Stiff Stability 

It is the purpose of this note to show that a slightly different sta¬ 
bility requirement permits methods of higher accuracy. 

(O. Widlund 1967) 

The angle a is only one of a number of parameters which have 
been proposed for measuring the extent of the stability region. 
But it is probably the best such measure ... 

(Skeel & Kong 1977) 

Many important classes of practical problems do not require stability on the entire 
left half-plane C“ . Further, for eigenvalues on the imaginary axis, the solutions are 
often highly oscillatory and one is then forced anyhow to restrict the step size “to 
the highest frequency present in order to represent the signal” (Gear 1971, p. 214). 

Definition 2.1 (Widlund 1967). A convergent linear multistep method is A(a)- 
stable , 0 < a < 7t/2, if 

S D S a = {/i ; |arg(-/i)| < a, /i ^ 0}. (2.1) 

A method is A(0) -stable if it is A(a) -stable for some (sufficiently small) a > 0. 

Similarly, Gear (1971) required in his famous concept of “stiff stability” that 

S D {/i ; Re/i < —D} (2.2) 

for some D > 0 and that the method be “accurate” in a rectangle — D < Re^u < 
a, — 0 < Im/i < 6 for some a > 0 and 0 about 7 t/ 5. Many subsequent writers 
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didn’t like the inaccurate meaning of “accurate” in this definition and replaced it 
by something else. For example Jeltsch (1976) required that in addition to (2.2), 

|Ci(m)I > ICi(p)l) i — 2,...,k in |Re^| < a, |Im^| < 0, (2.3) 

where Ci(aO is the analytic continuation of the principal root Ci(0) = 1 of (1.6). 
Also, the rectangle given by 

|Im fi | < 0, —D < Re {i < —a 


should belong to S. 

Other concepts are A 0 -stable (Cryer 1973) if 



= 1 for — co < x < 0 

(2.4) 

and X-stable (a joke of 0. Nevanlinna 1979) if 



(—oo, 0] C S. 

(2.5) 

Of course, we have 



A(0)-stable 

A 0 -stable => A-stable 

(2.6) 


but neither implication is reversible (Exercise 3; see also “Theorem 1” of Jeltsch 
1976). 

The BDF methods (1.18) satisfy (2.1) for A (a)-stability and (2.2) for stiff 
stability with the values 


k 

1 

2 

3 

4 

5 

6 

a 

90° 

90° 

86.03° 

73.35° 

51.84° 

17.84° 

D 

0 

0 

0.083 

0.667 

2.327 

6.075 


High Order A(a.) -Stable Methods 

Dill and Gear ... and Jain and Srivastava ... have used comput¬ 
ers to construct stiffly stable methods of orders eight and eleven, 
respectively, but were unable to construct higher order stiffly sta¬ 
ble methods. Even though we have shown here that Ao -stable 
methods of arbitrarily high order exist, we conjecture that A(0) - 
stable linear multistep methods of higher order, of order greater 
than 20 say, do not exist. (Cryer 1973) 

Widlund (1967) showed that for every a < 7t/2, a arbitrarily close to 7t/2, there 
exist A(a) -stable multistep methods of order p — k for p = 3 and p — 4. It is now 
an interesting question whether such methods also exist for higher orders. Well, 
the answer consists of good news and bad news. 

First the good news. The conjecture of Cryer (see quotation) was quickly dis¬ 
proved by combining Cryer’s A 0 -stable methods with the result of Jeltsch (1976) 
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which says that certain A 0 -stable methods are also A(a) -stable. The following 
theorem shows that a can even be chosen arbitrarily close to tt /2: 

Theorem 2.2 (Grigorieff & Schroll 1978). Let a < 7r/2 be given. Then for every 
k G N there exists an A(a) -stable linear k -step method of order p = k. 

Proof For p = k = 2 the two-step BDF method which is A-stable, and hence 
A(a 2 )-stable for every a 2 < tt/2, does the job. For k arbitrary, we intercalate 
k — 2 values between a and 7t/2, 

a < a k-1 < a k _ 2 < ... < a 3 < a 2 < (2.8) 

and extend the method step by step with the help of Lemma 2.3. □ 


Lemma 2.3. Suppose an A(a) -stable k-step method of order p is given with 

0(0 ¥=0 if ICI = 1, C#1 (2.9a) 

cr (C) ¥ 0 if ICI = 1- (2.9b) 

Then for every a < a there exists an A(a) -stable (k + 1 )-step method of order 
p -f 1 which also satisfies (2.9). 


The proof follows very closely the ideas of Jeltsch & Nevanlinna (1982): Let g(Q 
and <r(C) represent the given k- step method with order condition 

- <7(0 = C p+1 (( - 1)P + 0((C - 1) ?+1 ). (2.10) 

If we multiply g and o by (£ — 1) we formally increase the order by 1 and at the 
same time leave the root locus curve unchanged. Everything seems to be proved. 
However, the new g -polynomial would have a double root at ( = 1 and would thus 
lead to an unstable method. We therefore choose e > 0 and multiply (2.10) by 
(C — 1 + e ), which moves the root slightly inside the unit circle. We then obtain a 
new method of order p + 1 if we put 


q(0 - e(C)(C-! + e ) 

<7 (C) = <7(0 (C - 1 + e) +£ C p+1 ((- l) p . 

Since p = k + 2 is excluded (by Theorem III.3.9 methods with p = k + 2 are sym¬ 
metric and violate Hypothesis (2.9a)), both polynomials g and a are of degree 
< k + 1 . Now the formula 


g(0 q-(C) £g p+ i(C-i) p 
0(0 8(0 <?(()((- 1 + e ) 


allows us to compare, for e small, the root-locus curves of the two methods. The 
fact that we are working with a(e l9 )/g(e l9 ) = l/p instead of p = g(e l9 ) / a(e l9 ) 
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does not matter, because the transformation /i4l//i maps the sector of Defini¬ 
tion 2.1 onto itself. Because of Hypothesis (2.9a), 1 is the only (simple) root of 
g(Q on the unit circle, therefore 


g (C) _ fio < r IC -1^ -1 
m e(0 ~ IC-1 + el 


for ( = e i9 . 


(2.13) 


A small obstacle still separates us from “endless pleasure, endless love, Semele 
enjoys above”: the denominator \( — 1 + e \, which becomes small for £ —> 0 and 
6 0. For p > 1, this “small” denominator is simply balanced by one of the factors 

| ( — 11 from the numerator and we have 


< 7(0 _ ^(0 

m q{ o 


< C-e 


(2.14) 


which means uniformpointwise convergence of a(()/g(() to cr(()/g(() if e-*0. 
Since cr(()/g(() is bounded away from the origin (Hypothesis (2.9b)), this also 
means uniform convergence of the angles. 

This is already sufficient to prove Theorem 2.2, where we always have p > 2. 
However, Lemma 2.3 remains valid for p — 1 too: the critical region is when 0 0 , 

in which case \a(e i9 )/g(e i9 )\ and \a(e i9 )/g(e i9 )\ tend to infinity like Const/0. 
Instead of (2.14) we have for p — 1 

g (C) _ £(0 < = 0 (i\ 

q(Q q(0 1C — 1 + e| \0/ 

Thus the angle (seen from the origin) between <?(()/q(() and cr(()/g{() is Q(e). 

□ 


Approximating Low Order Methods with High Order Ones 

The above proof of Lemma 2.3 actually shows more than angle-boundedness of 
the root locus curve, namely uniform convergence of the root locus curve of a high 
order method to that of a lower order one. This leads to the following theorem of 
Jeltsch & Nevanlinna (1982): 

Theorem 2.4. Let a linear stable k -step method of order p and stability domain 
S be given which satisfies (2.9a). Then to any closed set Ct C Int S C C and any 
K G N there exists a linear k + K -step method of order p + K whose stability 
domain S satisfies _ 

S D £1. 

Moreover if the first method is explicit, the higher-order method is also explicit. 

Proof. The proof is similar to that of Lemma 2.3. Instead of the sequence (2.8) we 
use a sequence of embedded closed and open subsets between Q and S (Urysohn’s 
Lemma). Hypothesis (2.9b) is ruled out by passing to the compactified topology of 

C = CU{oo}. □ 
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Remark. No method with non-empty Int S of practical interest violates Hypoth¬ 
esis (2.9a). Nevertheless, Theorem 2.4 remains valid without this hypothesis, but 
the proof becomes more complicated (see “Lemma 3.6” of Jeltsch & Nevanlinna 
1982). 

A Disc Theorem 

Another weakening of A-stability is to require stability for 

D r = {p ; |/i + r| < r}, (2.15) 

which is a disc of radius r in C - tangent to the imaginary axis at the origin. 
Theorems about stability in D r are stronger than theorems about A (a)-stability 
for eigenvalues close to the origin. The following result is, again, due to Jeltsch & 
Nevanlinna (1982): 

Theorem 2.5. Let a linear k-step method of order p be given with S D D r . Then 
for any r < r and any K E Njhere exists a linear k + K -step method of order 
p + K whose stability domain S satisfies S D D~ . 

Proof The map p\-yl/p used in the proof of Lemma 2.3 maps the exterior of D r 
onto the half-plane 

|/i E C ; Rep > — (2.16) 

Therefore the uniform convergence established in (2.14) also covers the new situa¬ 
tion if p > 1 . The case p = 1, however, needs a more careful study and we refer to 
the original paper of Jeltsch & Nevanlinna (1982, pp. 277-279). □ 


Accuracy Barriers for Linear Multistep Methods 


Now here is .the “bad news”: high order A(a)-stable methods, for a close to 
7r/2, cannot be of practical use, or in other words: “the second Dahlquist barrier 
cannot be broken”. The reason is simply that high order alone is not sufficient for 
high accuracy, because the methods then have enormous error constants. Jeltsch & 
Nevanlinna (1982) give an impressive staccato (from “Theorem 4.1” to “Lemma 
4.15”) of lower bounds for error constants and Peano kernels of methods having 
large stability domains. The Peano kernels, the most serious measures for the error, 
are defined by the formulas (see (III.2.15) and (III.2.3) of Volume I) 


L{x) = W + 1 



-s)y (q+J) 


(x + sh ) ds 


k 


= ( a M x + i h )~ h Pjy'( x + j h )) • 

]=0 


(2.17) 


(2.18) 
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The kernels K q (—s) = K q (s) are zero outside the interval 0 < s < k and are 
piecewise polynomials given by complicated formulas (see (III.2.16)) which appear 
not very attractive to work with. 

However, the formulas simplify if we use the Fourier transform which, for a 
function f(x ), is defined by 

/»oo 

hi) = / e~ ix( f{x)dx. (2.19) 


We obtain L from (2.17) by insertion of the definitions, several integrations by 
parts and transformations of double integrals: 


L(0 = h^ 1 I< q (h0-y^ +1 H0 

(2.20) 

=K q mm) q+l m, 

(2.21) 

and from (2.18) 

(2.22) 

Thus (2.20) and (2.22) give 


F,(-0 =F,(0 = (e(e i «)-*^(e < f))(iO- (,+1) , 

(2.23) 

a nice formula, involving the polynomials g and a , with which we are better ac¬ 
quainted. 

What about the usefulness of K q for error estimates? Well, it is the Parseval 
identity (Exercise 4) 

ll/ll L 2 (— 00,00) yJf/K ^ ^ 2 ( — 00,00) 

(2.24) 

which allows us to obtain the L 2 -estimate for the error 


l|i|lL 2 (-oo,oo)<^ + 1 H^llL~-|l?/ (,+ 1 ) |lL 2 , 

(2.25) 

as follows: 


II ^ II L 2 ( — oo,oo) 2 H^ll L 2 ( — oo,oo) 

(from (2.24)) 


(from (2.20)) 


< max \K q (£)\ 2 • j |y (9+1) (0| 2 ^ (estimation) 

U 2 q -\-2 ^ - - 

= \\K q \\l°° ■ \\y (q+1) \\h (definitions) 

= h 2 “ +2 \\K q H^oo • ||y (,+1) || 2 L 2 . (from(2.23), (2.24)) 

In order that the obtained estimates (2.25) for L express the actual errors of the 
numerical solution, we adopt throughout this section the normalization cr(l) = 1 
(cf. Eq. (III.2.13)). 
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And here is the theorem which tells us that linear multistep methods of order 
p > 2 and “large” stability domain cannot be precise: 

Theorem 2.6 (Jeltsch & Nevanlinna 1982). Consider k -step methods of order p > 
2, normalized by cr( 1) = 1, for which the disc D r of (2.15) is in the stability domain 
S. Then there exists a constant C > 0 (depending on k,p,q; but independent of 
r ) such that the Fourier transform of the Peano kernel K q (q<p) satisfies 

Halloo >c(0 P ” 2 . (2.26) 

The proof of Jeltsch & Nevanlinna is in two steps: 

a) The stability requirement forces some coefficients a • of R(z) to be large 
(Lemma 2.7 below), where as in (III.3.17) 

m = (^f) k =£«>*>' < 2 - 27 ) 

3=0 

3=0 

b) ||K \\ Loo can be bounded from below by max ; a ■ (Lemma 2.8). 

Lemma 2.7. If D r C S and p > 2 then 

a fc _ i >0 J_ 1 -ai b - 1 = (0 , " 1 -2 1 -* for j = 2,...,p-l. (2.29) 


Proof Stability in D r means that for all roots of g(() — g>o-(() = 0 lie in 

|C| < 1. Hence 

Q(0/a(()?D r for |C| > I- (2.30) 


Applying the Graeco-Roman transformation £ = (z + l)/(z — 1) and using (2.16) 
this means that 

Re -^ 7-7 > — r- for Re 2 > 0 (2.31) 

R(z) 2 r 


or 


0 2rS(i) + R(z) 

Re XU 


>0 


for Re z > 0. 


(2.32) 


Next, we must consider the order conditions (Lemma III.3.7 and Exercise 9 of 
Sect. III. 3) 

This shows that R(z) = G(z k ~ l ), S(z) = G(z k ), but 25(2:) — zR(z) = 0(z k ~ 1 ). 
Thus we subtract rz from (2.32) in order to lower the degree of the numerator. The 
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resulting function again satisfies 


Re 


r(2S(z) — zR(z)) + R(z) 

W) 


>o 


for Rez > 0 


(2.34) 


because of Re (rz) = 0 on z — iy and the maximum principle (an idea similar to 
that of Lemma IV.5.21). The function (2.34) can therefore have no zeros in C+ 
(since by Taylor expansion all arguments of a function appear in a complex neigh¬ 
bourhood of a zero). Therefore the numerator of (2.34) must have non-negative 
coefficients (cf. the proof of Lemma III.3.6). Multiplying out (2.33) and (2.34) we 
obtain for the coefficient of z k ~i (j < p — 1): 

/ 1 4 \ 

o < r[--a k _ j+1 - ~ a k _ j+3 -...)+ a k _j 

or by simplifying (cf. Lemmas III.3.8 and III.3.6) 

\ a k-j+l < H-y 

Using a Jc _ 1 = 2 1 ~ k g f ( 1) = 2 1 ~ k (see Lemma III.3.6), this leads to (2.29). □ 


Lemma 2.8. There exists C > 0 (depending on k y p and q with g = 0,l,...,p) 
with the following property: if 0 G S, then 

\\K a \\l°° > C • max a-. (2.35) 

y j j 


Proof We set £ = — i log £ in Eq. (2.23) so that the maximum must be 

taken over the set \(\ = 1. Then we introduce ( = (z+ l)/(z — 1) and take the 
maximum over the imaginary axis. This gives with (2.27) and (2.28) 




R(it) 

(**)* Vlog ^ 


■ S(it) 


2 it 
it — 1 


log 


it 1 
it — 1 


m 


m 


We now insert, for |f | > 1 , Eqs. (III.3.19), (III.3.21) and (III.3.22) to obtain 


(2.36) 


$(f)| 



+ 


<*t 

0 itf + 1 


+ 


C?2 

(: it) k + 2 


(2.37) 


where P k is a polynomial of degree k and subdegree p (see Lemma III.3.7), de¬ 
termined by the method. Since we want our estimates to be true for all methods, 
we treat P k as an arbitrary polynomial. Separating real and imaginary parts and 
substituting 1 /t = s gives 


\m \ 2 =\Qk-i(s) + d,s k+1 - d 3 s k+3 + - ... | 2 (2.38) 

+ + d 2 s k+2 — d 4 s i+4 + — ... | = | < £ 1 (f)| 2 + |$ 2 (t)| 2 
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where Q k _i(s) anc * Qk( s ) are arbitrary (even or odd) polynomials of subdegree 
p and degree k — l and k , respectively. Both terms are minorized separately, e.g. 
for the first we write 


l*i(*)l > IQt-lOO + d lS k+1 1 - |d 3a fc+3 - d s s k+5 + - ... (2.39) 

Since fi x < /i 3 < p 5 < ... < 0 (Exercise 6 below) and a ■ > 0 we have from (III.3.22) 

d x < d 3 < d 5 < . . . < 0 and d 2 < d 4 < d 6 < ... < 0. (2.40) 

Therefore, the second term in (2.39) is majorized by the alternating series argument 
for 0 < 5 < 1 as 


| d 3 s k+3 -d 5 s k+5 + \d 3 |s fc+3 < || s k+3 . 


Since Q k _ 1 (s) is an arbitrary polynomial, we can replace it by \d 1 \Q k _ 1 (s) so 
that |c ? 1 1 becomes a common factor of the whole expression 

1*1 W| > K|(|Q*-i(a) + a fc+1 | -3 fc+3 ). (2.41) 

This suggests that we define the constants 




(2.42) 


where the inf is taken over all polynomials Q k -i{s) = c k _ 1 s k ~ 1 + c k _ 3 s k ~ 3 + 
c k _ 6 s k ~~ 5 + ... respectively Q k (s) = c k s k + c k _ 2 s k ~ 2 + c k _ 4 s k ~^ + ... of sub¬ 
degree p. The last two factors represent 4/(£) of (2.36). Since s k + x dominates 
5 fc + 3 f or sma ii j) l an d p) 2 are positive constants (see Exercise 8 ). We then 
have from (2.38) and (2.36) 


II^ILco > \Jd\D\ + d\D\ (2.43) 

Since both d x and d 2 are sums of a- with negative coefficients (see (III.3.22) 
and Lemma III.3. 8 ), must be large if one of the coefficient a- is large. 

□ 


This concludes the proof of Theorem 2.6 which, by the way, also proves The¬ 
orem 1.4 again. □ 
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Exercises 

1. Show that no explicit method can be A(0) -stable. 

2. Show that /3 k /a k >0 is a necessary condition for an A(a) -stable linear fc-step 
method. 

3. a) Show that the method 

Vn +2 — y n + 1 “ 4 (/n+ 2 + %fn+l + fn) 

has a stability domain bounded by a parabola. It is therefore A 0 -stable, but not 
A(0) -stable (Cryer 1973). 

b) Find a “deformation” of the 5th order BDF scheme 

X)^y n+1 +/3V 6 j, n+1 =^/ n+1 

with (3 « 0.232 ... which is A-stable, but not A 0 -stable. 

c) Find a method which is A 0 -stable, but not stable at infinity. 

Hint for (c). If you “lift up your heads, o ye gates” (just a few lines, not to 
heaven), the answer is easy to find. 

4. (Parseval 1799). Prove the identity (2.24). 

Hint. Insert the definitions into 

/ oo _ 

Rohm 

-CO 

to get a triple integral. Two of these integrals then disappear with the Fourier 
inversion formula. 

Remark. You may be astonished to see that Parseval’s identity is older than 
Fourier series and Fourier transforms. Well, Parseval’s identity was originally a 
formula between an infinite sum and an integral, which was later re-interpreted 
and generalized to become what it is today. 

5. Substitute £ = n in Formula (2.23) to obtain an easy minorization for \\K q \\ LO o • 
Then compute for the methods defined in the proof of Lemma 2.3 (normalized 
by <j(l) = 1) the value a(— 1) for e small. This then shows that K q becomes 
very large. 

6. Use the formula (see the proof of Lemma III.3.8) 

»2 j+ i = * 2i (( lo g +7f2 ) dx 

to show that /j , 1 > ii 3 > ii 5 > ... > 0. 



260 V. Multistep Methods for Stiff Problems 


7. Show that for g = p Eq. (2.23) becomes, by substituting = h and letting 
h Y 0 in Eq. ( 1 . 20 ), K p ( 0 ) = C p+1 , where C p+1 is, for <r(l) = 1 , the error 
constant. 

Formula (2.36) then provides, for p = k and t —>■ oo, lower bounds for the error 
constant (see “Theorem 4.5” of Jeltsch & Nevanlinna 1982). 

8 . For p = k + 1, the polynomials Q k _ i and Q k in (2.42) vanish identically, 
because the subdegree must be p . Compute in this case the constants D 1 and 
D 2 . It is also easy to compute them for p = k — 1 . In the general case the 
optimal solution satisfies a sort of “Tchebysheff alternative”. 

Results . 

Case p = k + 1 ((5 = 0): 


Di 

p = 3 p — 4 p — 5 p — 6 D 2 

k = 2 k = 3 k = 4 k = 5 

p — 3 p = 4 p = 5 p — 6 
k = 2 k = 3 /c = 4 k = 5 

O 

II 

0.4742 0.5695 0.7020 0.8813 q = 0 

0.3607 0.4501 0.5706 0.7319 

q= 1 

0.3876 0.4435 0.5298 0.6505 q = 1 

0.2754 0.3347 0.4163 0.5263 

q = 2 

0.3524 0.3659 0.4152 0.4933 q = 2 

0.2205 0.2570 0.3108 0.3852 

q = 3 

0.5000 0.3381 0.3459 0.3891 q = 3 

0.1935 0.2075 0.2400 0.2888 

q = 4 

0.5000 0.3251 0.3275 g = 4 

0.1849 0.1956 0.2244 

q = 5 

0.5000 0.3131 q = 5 

0.1770 0.1845 

q — 6 

0.5000 q — 6 

0.1698 


Case p = k — 1 (one free constant in Q): 


•Di 

p — 3 p = 4 p = 5 p — 6 D 2 

= 4 fc = 5 A; = 6 /c = 7 

P = 3 p = 4 p = 5 p = 6 
/c = 4 fc=5 fc = 6 fc = 7 

O 

II 

& 

0.0511 0.0362 0.0262 0.0193 q = 0 

0.0195 0.0142 0.0104 0.0077 

q= 1 

0.0727 0.0499 0.0353 0.0256 q = 1 

0.0269 0.0191 0.0138 0.0101 

9 = 2 

0.1100 0.0709 0.0486 0.0344 g = 2 

0.0384 0.0263 0.0186 0.0135 

9 = 3 

0.2031 0.1070 0.0691 0.0474 g = 3 

0.0583 0.0374 0.0256 0.0181 

II 

0.1962 0.1041 0.0673 g = 4 

0.0567 0.0365 0.0250 

II 

0.1894 0.1012 g = 5 

0.0552 0.0356 

q = 6 

0.1828 q = 6 

0.0537 


Case p = k — 3 (two free constants in Q): 


Di 

p = 3 p = 4 p = 5 p = 6 D ‘2 

k = 6 A; = 7 fc = 8 fc = 9 

p = 3 p = 4 p = 5 p = 6 
= 6 fc = 7 fc = 8 fc = 9 

9 = 0 

0.0030 0.0014 0.0007 0.0003 g = 0 

0.0007 0.0004 0.0002 0.0001 

9=1 

0.0066 0.0029 0.0014 0.0007 g = 1 

0.0015 0.0007 0.0003 0.0002 

9 = 2 

0.0160 0.0066 0.0029 0.0014 g = 2 

0.0034 0.0015 0.0007 0.0003 

9 = 3 

0.0457 0.0158 0.0065 0.0029 g = 3 

0.0082 0.0034 0.0015 0.0007 

q = A 

0.0448 0.0156 0.0064 g = 4 

0.0081 0.0033 0.0015 

9 = 5 

0.0439 0.0154 g = 5 

0.0080 0.0033 

9 = 6 

0.0431 g = 6 

0.0079 
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The Dahlquist bound of two on the order of A-stable multistep 
methods was the imperative to propound ... weaker stability prop¬ 
erties, ... An alternative approach for circumventing Dahlquist’s 
bound is to modify the class of methods, rather than the property. 

(T.A. Bickart & W.B. Rubin 1974) 


The search for higher order A-stable multistep methods is carried out in two main 
directions: 

• Use higher derivatives of the solutions; 

• Throw in additional stages, off-step points, super-future points and the like, 
which leads into the large field of general linear methods. 


Second Derivative Multistep Methods of Enright 

Hermite’s formulas are rediscovered and republished every four 
years. (RJ. Davis 1963) 

Differentiation of a differential equation 

ll' = f(x,y) (3.1) 

with respect to x gives the second derivative of the solution 

y" = f x + f y - f =■ g{ x ,y)-, (3-2) 

which we shall denote by g . Now a straightforward generalization of both multi- 
step formulas (1.1) and, say, the Taylor series method (see 1.8.13) 

Ji 2 

2/n+l =y n + h fn + ^9n 

can be written in the form 

k k k 

Y, a i y "+i = h Y,Pif"+i + h2 12" l i g n+i ( 3 ‘ 3 ) 

2=0 2=0 2=0 

where the a i , (3 i9 7 i are parameters which must be chosen appropriately. Most of 
the theory of linear multistep methods (Sect. III.2) generalizes without difficulty. 
Taylor expansion similar to (III.2.5) shows that method (3.3) is of order p if and 
only if 

k k k 

a i ^ = q + q ( q ~ X ) ^ 

2=0 2=0 2=0 


(3.4) 
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for 0 < q < p. The first two of these formulas are identical to (III.2.6), i.e., to 

<?(!)= 0, *>'(l)=<r(l). C 

The error constant (see Eq. (III.2.13) and Exercise 2 of Sect. III.4) is given by 


cr(l)(p+ 1 


^(^2 a i iP+1 -(p +l )^2^ iP -(p +l )pJ2^ iF O' (3 ' 6) 


A search for a good choice of the free parameters a ■, /?■, 7 - was undertaken by 

Enright (1974) with the following ideas: 

(i) Set a k = 1, OL k _ x = — 1, a k _ 2 = ... = a 0 = 0 to ensure reasonable stability 
in a neighbourhood of the origin as in the standard Adams formulas; 

(ii) Set 7 ^ 7 ^ 0, 7 fc __ 1 = . • • = 7 0 = 0 to ensure stability at infinity as in the BDF 
formulas; 

(iii) Determine the remaining k + 2 coefficients 7 k , (3 k , /3 k _ 1 ^ /3 0 from Equa¬ 

tions (3.4) for q = 1,2,..., k + 2 (q = 0 is satisfied with (i)) to ensure a rea¬ 
sonably high order. 

The result is a class of k -step formulas of order k + 2, which are of the form 


= y n + h E 


The first few of these methods are 


k = 1 ; Vn+l =Vn+ h (jj/n+l + 3 /„) - g h 2 9 n +1 

k = 2: y n+1 = y n + h(^f n+1 + ^/„ - \ h 2 g n+1 

k = 3 : y "+ 1 + (540^+1 + 40^" “ 20^ n_1 + 1080 ^ n ~ 2 J 

180 h 9n+1 

_ , , / 3133 , 47 41 1 , 

^ 4 • 2/n+i 24 +h( 7 J n _|_i + on /n /ion fn-1 + A xfn -2 


90 480 

17 


r \_ ^7 2 

J n— 3 y 22 ^ ^ 


For a general expression, see Eq. (3.12) below and Exercise 1. 

The stability analysis for second derivative methods is again done by lineariz¬ 
ing and leads to 

y' = \y for which y" = \ 2 y. (3.8) 

This, inserted into (3.3), gives as the characteristic equation 


£(<*,-ix = h\ 
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instead of (1.6). Equation (3.9) is, for ( = e i0 , a quadratic equation which gives 
rise to two root locus curves which, together, describe the stability domain. The 
Enright methods (3.7) turn out to be A-stable for k = 1 and 2 (hence for p = 3 
and 4) and are stiffly stable for k = 3, 4, 5, 6 and 7. The corresponding values a 
(for A(a) -stability), D and the error constants C are given in Table 3.1. Pictures 
are shown in Fig. 3.1. 


Table 3.1. Stability characteristics and error constants for Enright methods 


k 

1 

2 

3 

4 

5 

6 

7 

p 

3 

4 

5 

6 

7 

8 

9 

a 

VO 

o 

o 

90° 

87.88° 

82.03° 

73.10° 

59.95° 

37.61° 

D 

0. 

0. 

0.103 

0.526 

1.339 

2.728 

5.182 

C 

0.01389 

0.00486 

0.00236 

0.00136 

0.00086 

0.00059 

0.00042 



Fig. 3.1. Stability domains of Enright methods 


Dense Output for Enright Methods. We have seen in Sect. III. 1 that Newton’s 
interpolation formula, based on the data x n+1 , x n ,..., x n _ fc+1 , 

• when integrated from x n to x n+1 , leads to the implicit Adams methods; 

• when differentiated at x n+1 , leads to the BDF methods. 

It is natural to apply the same idea to Hermite interpolation (Addison 1979): guided 
by much previous experience (see above) we choose the data points 

x n+1 (double node), x n , x n _ 1 ,..., x n _ k + 1 (simple nodes). (3.10) 
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This gives the following scheme of divided differences 


3 — 1 

3 = 1 

/l 

/l 

hf[ 

hfi-Vf, 

S = 0 

/o 

V /i 

1 V 2 /i 

2 1 

S = -1 /-I 

V/ 0 


2! 


where x = x n + sh. For these “confluent” data, Newton’s interpolation formula 
becomes 

(U P 


f(x n + sh) = /j + (a - 1 )hf[ +{s- - V/J 


+ (s-l) 2 s(s + l) 


2 ! 

3! 


(3.11) 


+ ... 


We now interpret / as the derivative f(x,y{x)) of the solution, so that /' becomes 
the second derivative. Integrating Formula (3.11) from x n to x n+1 we obtain 

Vn+l =Vn + h fn+ 1 ~ h Yl V { n+1 fe "i) + h2 9n+l ' fe "i) ( 3 - 12 ) 


i=i 




i=0 


where 


>i= | (.- i ).. ( . + i)-(. +i - 2 ) <b = ( _ 1 ) ^ (i _ 1 ) (i-.) <fa 


(3.13) 


Table 3.2. Coefficients for Enright methods 


i 

0 

1 

2 

3 

4 

5 

6 

7 

i/• 

1 

1 

1 

7 

17 

41 

731 

8563 


2 

3 

24 

360 

1440 

5040 

120960 

1814400 


The first few values of z/- are given in Table 3.2 and Eq. (3.12) is seen to be 
identical with (3.7). Dense output, of course, is obtained by integrating (3.11) from 

x n t0 x n + 0h: 

y{ x n + 0h )-yn + eh fn+l- h J2~^f ±L ^9n+l ' fe 

j -1 3 ' i=j ' N=0 ' 


where 


= (- 1 ) 2 i s ) 
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Second Derivative BDF Methods 


If we are interested in a “second derivative” analogue of the BDF methods, we 
replace all /’s by y's in (3.11) and differentiate twice at x nJrl . This, on setting 
y ff ( x n+ 1 ) — 9n +1 ’ results in the methods 




(3.14) 


which we call “Second derivative BDF methods ” (SDBDF, the reader is cautioned 
against confusion: Cash (1981) uses this expression for the class of “Enright meth¬ 
ods”). Analyzing the stability of these methods leads to the parameters of Table 3.3. 
The root locus curves are drawn in Fig. 3.2. 

In complete analogy to the behaviour of implicit Adams compared to BDF 
methods, the second derivative BDF methods have larger error constants than the 
Enright methods, but allow stiffly stable methods of higher order. 


Table 3.3. Stability characteristics and error constants for SDBDF methods 


k 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

p 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

a 

o 

O 

ON 

90° 

90° 

89.36° 

86.35° 

80.82° 

72.53° 

60.71° 

43.39° 

12.34° 

D 

0. 

0. 

0. 

0.015 

0.128 

0.401 

0.886 

1.646 

2.770 

4.373 

C 

.1667 

.0556 

.0273 

.0160 

.0104 

.0073 

.0054 

.0041 

.0032 

.0026 



Fig. 3.2. Root locus curves of SDBDF methods 
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Blended Multistep Methods 


The original motivation for blended methods goes as follows (Skeel & Kong 1977): 
We know that Adams methods 

_ 2 /n + l + 2 /n +^(A/n+l +Pk-lfn + * * * + A)/n-*+l) = 0 (AMF ( * +1) ) 

are a very good choice for nonstiff problems, and that BDF methods 

-(«*y»+i + a k-iVn + • • • + VoVn-k+l) + h f n+1 =0 (BDF ik) ) 

are a very good choice for stiff problems. Nonstiff problems are characterized by 
the fact that —hdf/dy is small , while stiff problems are characterized by large 
—h df/dy (at first this makes sense only for scalar equations; but it works as well 
for systems of equations if we descend into the eigenspaces of the Jacobian matrix 
df Idy = J). The idea is now to use a weighted mean (“blend”, a term suggested 
by C.W. Gear) of the two methods such as 

{AMF (k+l) } ~'y (k) hJ{BDF ik) } = 0 (3.15) 

where ~ tk] is a free parameter. The factor —hJ, when small or large, just puts the 
weight at the right place, as required by the above motivation. Taylor expansion 
shows that Eq. (3.15) is for all 7 ^) of order p = k + 1 (the factor “/i” in the sec¬ 
ond term saves one order), even if J differs from df/dy. This method is thus a 
multistep analogue to the W -methods discussed in Sect. IV.7. 


Example. We put k — 2 in (3.15) and insert the values from Sect. III. 1, Formulas 
(III. 1.8”) and (III. 1.22”): 


/ 5 8 1 \ 

2 /n+i Vn ^ \ 12^ n +i 71 

- 7 {2) hJ (- ~y n+1 + 2 y n - \v n -i + h fn+ 1 ) • 


(3.16) 


If we now suppose that our differential equation is linear and autonomous y f = Jy, 
then Jy n +i = / n+i and the equation simplifies. Two special choices for 7 ^) are 
then interesting: 

a) 7 ( 2 ) = 1/6 : In this case the f n _ 1 cancels with Jy n __ 1 and Eq. (3.16) becomes 
the (k — 1) -step Enright formula of order k + 1; 

b) 7 ( 2 ) = 1/8 : This is a “superconvergence point” for linear equations and we 
obtain the k -step Enright formula of order k -f 2. 

Both properties generalize to arbitrary fc; in the first case we have to put 7 W = 
—k^l, where the 7 ^ are the values of Table III. 1.2, and in the second case we 
use 7 W = — 0 u i as in (3.12). Blended methods therefore share the excellent 

stability properties of the Enright methods and seem, at the same time, easier to 
implement. A third possibility is to choose 7 ^) in order to maximize the angle a 
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Table 3.4. Values for 7 ^ and corresponding angles for blended methods 


k 

P 

-* 7 : 

* 

i —1 I 

II 

S' 

"T- 

J k ) 

' opt 

a for 

-(*)--(*) 

1 — fopt 

1 

2 

.5 

0 

0 

ON 

[0, + 00 ) 

0 

0 

ON 

2 

3 

.1666667 

O 

O 

ON 

[.125,+ 00 ) 

90° 

3 

4 

.125 

O 

O 

ON 

[.12189, .68379] 

90° 

4 

5 

.1055556 

87.88° 

.1284997 

89.42° 

5 

6 

.09375 

82.03° 

.1087264 

86.97° 

6 

7 

.08561508 

73.10° 

.0962596 

82.94° 

7 

8 

.07957176 

59.95° 

.08754864 

77.43° 

8 

9 

.07485229 

37.61° 

.08105624 

70.22° 

9 

10 

.07103299 

— 

.07599875 

60.68° 

10 

11 

.06785850 

— 

.07192937 

47.63° 

11 

12 

.06516462 

— 

.06857226 

28.68° 


for A(a) -stability. The root-locus-curve equation for general 7 W becomes 
k k 

^ . 7 <*) + ^ (- £ 7*(1 - e~ i6 y - 7 (i:) £ -(! - + (1 - = 0 . 

i=o j= 1 J 

Skeel & Kong (1977) have carefully computed the optimal 7 ^) (see Table 3.4, the 
imprecise values for the “Enright column” have been corrected) and arrived thereby 
at stiffly stable methods up to order 12 . 


Extended Multistep Methods of Cash 


The second possibility for circumventing Dahlquist’s barrier, instead of adding 
higher derivatives, is to add further stages, additional nodes, or off-step points. 
This leads into the huge desert (“A fable of K. Burrage”) of general linear meth¬ 
ods which have been discussed in Sect. III. 8 . Pioneering results for stiff differen¬ 
tial equations are the “composite multistep methods” of Sloate & Bickart (1973), 
Bickart & Rubin (1974), the “hybrid” methods of England (1982), and the “ex¬ 
tended” BDF methods of Cash (1980). We shall present the basic ideas for the 
latter in some detail. In order to increase stability of the BDF methods, we extend 
them by adding a “super-future” point at 

k 

XX'J/n+j = h Pkfn+k + h(3 k+ J n+k + 1 , (3.17) 

i =0 

where the coefficients are obtained by solving J2j a j3 q = <7 Pj3 q ~ l for q = 
0,1,..., k + 1 with the normalization a k = 1. Formula (3.17) is then used as 
follows (see Fig. 3.3): 
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(i) Suppose that the solution values y n , y n+1 ,..., y n+fe _ 1 are available. 

Compute y n + k as the solution of the conventional BDF formula 

k 

Yj^jVn+j = h Pkfn+k, S fc = 1 ; (3.17i) 

j~ 0 

(ii) Compute y n+fc+1 as the solution of the same BDF formula advanced by one 
step (using y n+k for y n+k ) 

k 

^ ^j 2/n+j + l hftk f n+fe+1 (Vn-i-k ' Vn-\-k) (3.17ii) 

i=o 

and set / n+fc+1 = /(^ n+fe+1 , y n+fe+1 ); 

(iii) Discard y n + k , insert / n+fe+1 into (3.17) and solve for a new y n + k which 
serves as the final numerical solution of the method. 

The advance of the numerical solution by one step thus requires the solution of 
three nonlinear systems of dimension n . In stage (i) and stage (iii) we have excel¬ 
lent initial approximations: the super future point of the previous step and the value 
y n + k , respectively. 



Lemma 3.1 (Cash 1980). If Formula (3.17) is of order k + 1 and the BDF meth¬ 
ods used in (3.17i) and (3.17U) are of order k, then the whole predictor-corrector 
algorithm (i)-(iii) is of order k + 1. 

Proof Suppose that y n ,... ,y n+fe _ 1 are on the exact solution (Fig. 3.3). Then a 
simple calculation (as in the proof of Lemma III.2.2, see also Eq. (III.2.7)) shows 
that 

y(x n+k ) - y n+k = C^+'yV+'Hx^) + 0(h k+ 2 ) (3.18) 

y(x n+k+1 )~ y n+k+1 =cf 1 - ^f)h k+l y( k+l \x n+k ) + <D(h k + 2 ) (3.19) 

where C x depends on the BDF method used. If now C 2 h k + 2 y ( < k + 2 \ff) is the 
defect of Eq. (3.17) (for the exact solution), replacing fe/(x n+fc+1 , i/(x n+fc+1 )) by 
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^/(^n+fc+i? Vn+k+i) adds ^ expression obtained in (3.19) to this error and we 
obtain 

y(x n+k ) ~ y n+ k = h k + 2 (c 2 y( fc+2 > +(l - g • y (fc+1) ) (z n+ *) 

+ 0(h k+3 ). (3.20) 

The method is thus of order k + 1. Like Runge-Kutta methods, but unlike linear 
multistep methods, the principal error term is composed of several “elementary 
differentials”. □ 


Modified EBDF Methods. A disadvantage of the above algorithm is that stages (i) 
and (ii) represent nonlinear systems with the same Jacobian I - h(3 k J, but stage 
(iii) has a different Jacobian I — hj3 k J . This requires an extra LU-decomposition. 
The idea is to modify Eq. (3.17) for stage (iii) as follows (Cash 1983): 

k 

Y a jyn+j = h Pkfn+k + MAfc “ Pk)ln+k + h Pk+Jn+k+l ■ (3.17.mod) 

3=0 

This just adds an extra h k + 2 -term to the above proof and does not alter the order 
of the method. It allows the same Jacobian to be used in the Newton iteration for 
all three stages, and, possibly, to preserve it over several steps as well. 

Stability Analysis. We insert hf- = /iy- in (3.17.mod), (3.17i) and (3.17ii), set 
y n = 1. V n +1 = (,■•■■ y n+k -i = ( k ~ 1 an d compute, following the algorithm (i), 
(ii), (iii), the solution y n + k = : ( k . This gives the characteristic equation 

A/i 3 + Bfj, 2 + C/i + D = 0 (3.21) 

where 

A = p 3 k ( k 

B = -2f3lC k + p k (0 k - 0 k )R + P k P k+ iS- ftT 

C = &C* + (*k-iPk + i -P k +P k )R- P k+ iS + 2 0 k T 

D = —T ' } 

* = £*><'> S = T = ±ay>. 

j— 0 3 — 0 i=° 

Inserting ( = e ie , Equation (3.21) gives us three roots /i •($) i = 1,2,3, which de¬ 
scribe the stability domain. These, computed by Cardano’s formula, are displayed 
in Fig. 3.4. The corresponding stability characteristics are given in Table 3.5. The 
methods are A-stable for p < 4 and are stiffly stable for orders up to 9. 
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Table 3.5. Stability measures for Cash’s modified EBDF methods 


k 

1 

2 

3 

4 

5 

6 

7 

8 

P 

2 

3 

4 

5 

6 

7 

8 

9 

a 

o 

o 

o 

o 

o 

o 

88.36° 

83.07° 

74.48° 

61.98° 

42.87° 

D 

0. 

0. 

0. 

0.040 

0.246 

0.684 

1.402 

2.432 



Multistep Collocation Methods 

... a theorem of great antiquity ... the simple theorem of poly¬ 
nomial interpolation upon which much practical numerical anal¬ 
ysis rests ... 

(P.J. Davis, Interp. and Approx., Chapter II, 1963) 

There are essentially two possibilities to extend the idea of collocation, which is 
so successful in the Runge-Kutta case (see Sect. II.7, Formulas (II.7.16)), into the 
multistep scene: 

a) In aNordsieck type manner with given y n ,hy f n , • • • compute y n+1 , 

hy f n+ 1, h 2 y'^ +1 / 2, ... The result is a spline function which approximates the so¬ 
lution globally. Butcher’s generalized singly-implicit methods (Butcher 1981) are 
of this type. Extensive studies of these methods are due to Miilthei (1982). 

b) In a multistep manner with given y n , y n _ 1 ,..., y n _ A:+1 compute y n+1 , 
then discard, as usual, the last point y n _ k+1 and continue. This possibility was 
first proposed and analysed by Guillou & Soule (1969). It is also the subject of a 
paper by Lie & N0rsett (1989) and will retain our attention here in more detail. In 
evident generalization of Definition II.7.6, the method is defined as follows: 
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Definition 3.2. Let 5 real numbers c 1 ,..., c 3 (typically between 0 and 1) be given 
and k solution values y n , y n _i ,..., y n _ kJrl . Then define the corresponding col¬ 
location polynomial u(x) of degree 5 + k — 1 by (see Fig. 3.5) 

u(xj) = yj j — n — k + 1,..., n (3.23a) 

u'(x n +c i h) = f(x n +c i h,u(x n + c i h)) i = l,...,s. (3.23b) 

The numerical solution is then 

y n +i '• = u { x n + 1 )- (3.23c) 



If we suppose the derivatives u'(x n + c { h) are known, Eqs. (3.23a) and (3.23b) 
constitute a Hermite interpolation problem with incomplete data: the function val¬ 
ues at x n + c-h are missing. We therefore have no nice formulas and reduce the 
problem to a linear algebraic equation. We introduce the dimensionless coordinate 
t = (x — x n )/h , x — x n +th , nodes = —fc + 1 ,... ,t k _ x = —1, t k — 0 and 
define polynomials (i = 1 ,..., k) of degree s-\-k — 1 by 


fi^i) = 


0 if 17 


1 if i = j 
and polynomials ^(f) (i = 1 ,..., s) by 


1 ,...fc 


(3.24) 


V’i(fj) =0 

*;(■=,) = {J 


j = l,...,fc 

if i = j 


(3.25) 


This makes these polynomials a (generalized) Lagrange basis and the polynomial 
u(x) is readily written as 


u { x n "F th) — E ‘Pji^Vn-k+i + h '}2^3 ( y t ) U \ X n+ C 3 h )- 

j— 1 j— 1 


(3.26) 
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Formulas (3.24) and (3.25) do not always have a solution (Exercise 4 below). A 
convenient way of computing them is indicated in Exercise 5. Putting t = c • in 
(3.26), writing u(x n + c i h) — v i and inserting the collocation condition (3.23b) 
we obtain 

k s 

v * j( c i)Vn-k+j + (3.27a) 

j=i j =i 

k S 

y n +1 =YlVj( 1 )y n -k+j+ h Y;‘ > {’3( 1 W( X n +C j h ’ V 3^ (3.27b) 

3 =1 3 =1 

a general linear method as defined in (III.8.7). 


Theorem 3.3. The collocation method (3.23) is equivalent to the general linear 
method 


v i = J2 % yn-k+j + h Yl h a + C A V i ) *' = 1, • • • , 5 

3 -1 3 = 1 

k s 

y n + 1 = a k + l,J yn-k+j + h Yl b k+l,j f( X n + c j h , V J ) 

3 =1 3 =1 


(3.28) 


where 


a l j= < Pj{c i ), b i j=t/j j {c i ), 1), = ^j(l) (3.29) 

and are polynomials defined by (3.24) and (3.25). Formula (3.26) 

provides a continuous output. □ 


A straightforward extension of the proof of Theorem II.7.9, again using the 
Grobner & Alekseev formula (1.14.18), yields 

Theorem 3.4 (Guillou & Soule 1969). If the quadrature formula (3.27b) is exact 
for polynomials g(t) of degree < s-\~k + r, i.e., Y^j=i Tjif) — 1 an d 

53‘M 1 ) f 9{t)dt = y2‘tl> i {l)g{c i ), 
j =1 J i- k ;=1 

then the multistep collocation method (3.28) also has order s + k + r. □ 



V.3 Generalized Multistep Methods 273 


Methods of “Radau” Type 

Nous allons maintenant etudier une classe de formules qui gene¬ 
ralise les formules ordinaires de Gauss, Radau et Lobatto. 

(Guillou & Soule 1969) 

An interesting question is now how to choose the nodes c • in order to obtain the 
highest possible order. Using an elegant idea of Krylov (1959) (see the last chapter 
of his book on integration), Guillou & Soule (1969) and Lie & Nprsett (1989) con¬ 
structed such methods of maximal order p = 2s + k — 1. Unfortunately, these meth¬ 
ods are not stiffly stable and therefore of no use for stiff problems. Consequently, 
we fix c s = 1 to achieve stability at infinity and try to determine c l5 ..., c 3 _ 1 so 
that the order becomes p = 2s + k — 2. Because of Theorem 3.4, it is sufficient to 
consider quadrature problems. 

And now to Krylov’s idea for integrals, adapted to our situation. We fill in the 
gaps in the data for Hermite interpolation, i.e., we suppose that th e, function values 
u- = u(x n + c { h) (i = 1,..., 5 — 1) are known and we extend our Lagrange basis 
accordingly: firstly, we add polynomials Xi (*)>••• ? X 3 -i W of degree 2s + k — 2 
which must satisfy 


>< 

cH- 

<•<>. 

II 

o 



(3.30a) 

o 

II 

o'* 



(3.30b) 

Xi( c i) = { J 

i=j 7=1 

. ,5- 1 

(3.30c) 


(Caution: the last condition is not for j = s, because c s is not a free node). Sec¬ 
ondly, the polynomials (/?■(£) and ^-(f) are replaced by <£>■(£), ^-(f) of degree 
2s + k-2 which, in addition to (3.24) and (3.25), must satisfy 

Vi( c j)=° and Tpi( c j) = Q j = l,...,s-1. (3.31) 

Then Eq. (3.26) is replaced by 

k 5 — 1 s 

u(x n +th) = Y^ Vj(t)y n -k+j + + h ( x n + c j h )> ( 3 - 32 ) 

j— 1 j= 1 j— 1 

and (3.27b) becomes the integration formula 

k 5 — 1 5 

y n +i = ^2 vyy-k+j + U 1 )^- + h D $yy( x n + c j h ) (3-33) 

j =i i=i i=i 

which is of order 25 + k — 2. If now, by a miracle, all coefficients 

Xj(l) — 0 (j = l,...,s-l) (3.34) 

were zero, then the quadrature Formula (3.27b) would become equal to (3.33), 
since by uniqueness the remaining coefficients Cp- (1) and ifcj (1) must also be equal 
to <^(1) and 
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Theorem 3.5. If the collocation points c 1 ,..., c s _ 1 (with c s — 1) are chosen such 
that the polynomials p -(f), of (3.24), (3.25) exist uniquely and that (3.34) is 
true , then the collocation method (3.28) is of highest possible order 2s + k — 2. □ 


Computation of the Nodes. Equation (3.34) together with the conditions (3.30) 
allow us to write the polynomials XiOO i n the simple form 

k s 

x i (*)=cn(*-* i )n(*-s-) 2 - (3 - 35) 

M 1 i =1 

Mi 

where C is determined by Xi( c i ) — 1 • This then satisfies all derivative require¬ 
ments (3.30b), except at c { . x'i( c i) is readily computed from (3.35) by taking 
logarithms and the conditions x'i( c i) — 0 give 


E 


J = 1 



+ E 

3= 1 

Mi 



i — 1 , . . . , 5 — 1 . 


(3.36) 


Example. For the case 5 = 3, Eqs. (3.36) become (c 3 = 1) 


Cry — C 


Z V -v 1 

1 C 1 ~~ 1 C 1 ~ t 

J=1 J 

2 A 1 

— 1 Cry — t ■ 


(3.37) 


C 1 C 2 C 2 ± j =1 c 2 u j 


These two equations can easily be solved for c 2 and c 1 respectively, and lead to 
the curves displayed for k — 3 and k = 4 in Fig. 3.6. We see that a huge number 
of solutions is possible (precisely ( s ^^ 1 ) > Krylov imagined charged electrical 
particles in equilibrium to prove their existence), but most of these lead to totally 
unstable and therefore useless methods (in the sense of Sect.III.3). Thus the only 
choice which we retain are the rightmost solutions c i with 0 < c 1 , c 2 < 1, shown 
in Table 3.6 below. In addition, as Krylov has shown (see Krylov (1959), English 
translation 1962, p. 329) this choice leads to the smallest error constant (for once, 
stability and small error are not in conflict!) 


Stability of the Radau-iype Methods. The stability analysis of the Radau meth¬ 
ods is done by inserting y 1 = A y into (3.28). Since c s — 1 we have y n+1 = v s and 
thus obtain (for 5 = 3) the characteristic equation 

l-Al -M&12 -V h iz\ ( v \\ 

-JU&21 1-M&22 -P 6 23 I v 2 1 

— —fJ-b 32 1 — 
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Fig. 3.6. Solutions of (3.37). x unstable, □ stable 


or 


, ( 

1 

~ A^12 

-/A 3 ^ 

\ ( ° n 

tt 12 

a 13 

C 3 = (0,0,1) 

-Vb 2i 

1 — /i 6 2 2 


«21 

a 22 

a 23 

V 


— ^32 

1 - As j 

\«31 

a 32 

a 33 


1 

c 

C 2 

'(3.38) 


which, when multiplied by det(7 — fiB ), becomes a polynomial of degree 3 in /i. 
For a general multistep collocation method (3.28) we obtain in this way 


7fc(/ i )C fc + 7fc-i(/ i )C fc 1 + • • •+ 7o(/ i ) — o 


where q k (pt) = det(/ — (iB) and all g-(/i) are polynomials of degree at most 5. 

The root locus curves of Fig. 3.7 were again obtained by Cardano’s formula. 
Coefficients and stability measures are given in Table 3.6. The methods for k = 1,2 
(orders p — 5 and 6) are A-stable. The subsequent methods have surprisingly 
large a-values for very high orders (up to p « 20), which makes this class very 
promising. 


Exercises 


1. Show that the coefficients v- in (3.13) for the Enright methods can be com¬ 
puted recursively by 


1 

v i-~(j + l)(j + 2) 


j~ 1 

t: v k s j+x~k 

k=0 


where 




k=l 


1 

k{l + l-k)' 


Hint. See the proof of Eq. (III. 1.7). The generating function G(t) — u j^ 
becomes here f*(s — 1)(1 — t) 1 ~ 3 ds. 
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Table 3.6. Coefficients and stability measures 
for multistep Radau methods (5 = 3) 


k 

P 

ci 

C2 

C3 

a 

D 

1 

5 

0.155051025721682 

0.644948974278318 

1. 

90° 

0.000 

2 

6 

0.177891722985607 

0.673235257220651 

1. 

90° 

0.000 

3 

7 

0.192169638937766 

0.689317969824851 

1. 

89.73° 

0.016 

4 

8 

0.202814874040288 

0.700407719104611 

1. 

89.13° 

0.084 

5 

9 

0.211395456069620 

0.708798418188500 

1. 

88.61° 

0.178 

6 

10 

0.218626151232186 

0.715507419158199 

1. 

88.14° 

0.278 

7 

11 

0.224897548200883 

0.721072684914921 

1. 

87.70° 

0.376 

8 

12 

0.230448266933707 

0.725812172023161 

1. 

87.28° 

0.467 

9 

13 

0.235435607740434 

0.729928926504599 

1. 

86.89° 

0.555 

10 

14 

0.239969169367303 

0.733560240031675 

1. 

86.51° 

0.649 

11 

15 

0.244128606044551 

0.736803122952198 

1. 

86.14° 

0.763 

12 

16 

0.247973766491964 

0.739728565298052 

1. 

85.79° 

0.917 

13 

17 

0.251550844436705 

0.742390019356757 

1. 

85.44° 

1.135 

14 

18 

0.254896295040291 

0.744828697795402 

1. 

85.07° 

1.462 

15 

19 

0.258039429919700 

0.747077018862741 

1. 

84.68° 

1.995 

16 

20 

0.261004194709515 

0.749160923778290 

1. 

84.23° 

3.037 



Fig. 3.7. Root locus curves for multistep Radau methods (s = 3) 


2. The Enright Formulas are stiffly stable for k < 7 and are not stiffly stable, as 
one can easily inspect, e.g. by a computer plot, for k — 8, k — 9, ... and so 
on. Hence, everybody agrees that they are not stiffly stable for any k > 7. 
However, no rigorous proof has been found for this, as for instance the proof 
of Theorem III.3.4. Why don’t you try to find one? 
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3. Prove that the second derivative BDF methods (3.14) are unstable (in the sense 
of Sect. III.3) for k > 11. 


4. a) Show that forfc = 2,t 1 = —l,t 2 = =0,s = l,c 1 = —1/2 neither equations 
(3.24) nor equations (3.25) possess a solution. 

b) Show that (3.24) and (3.25) always admit unique solutions if all c • are dis¬ 
tinct and satisfy c 2 >0. 

Hint for (b). If y> i (or t ip i ) are written as Yliii a i tl ~ 1 > then (3.24) and (3.25) 
become linear systems with the same matrix and different right-hand sides. 
The corresponding homogeneous system then possesses a non-zero solution iff 
the interpolation problem 


P(*j)=° j = 1,..., fc 
p'( c j) = Q j = 


(3.39) 


has a non-zero solution. Since p f (t) has at most k + s — 2 real zeros and since 
(Rolle’s theorem) each interval {t h t l+1 ) must contain at least one of these, 
there can be at most 5 — 1 zeros beyond t k = 0. 


5. A convenient way of computing the polynomials (3.24), (3.25) (written here 
for the case 5 = 3) is to put 


k 

(pft) = (a'j +a 2 t + a 3 ^ 2 + a 4 * 3 ) (3.40) 

1 = 1 , 

Show that Eqs. (3.24) (for i = j) and (3.25) then become the following linear 
system 


«1 +* i «2 +^«3 +^ a 4 = ( 3 * 41 ) 

s j a i + ( s j c j + l ) a 2 + ( 5 i c j + % c j) a 3 + ( 5 i c i + 3c j) a 4 = 0, j = lj 2,3 
k k 


1 = 1 , 


where r i = (t i —t l ) : s-— -. Secondly, for 

i=i,i^i c j~ t t 
k 

^ w=( a i+ a 2* +%^ 2 ) ~ 


(3.42) 


z=i 


Eq. (3.25) becomes 


s^i + (s i c j + 1 )a 2 + (5^-Cj + 2cj)a 3 = j ^ 

fc ^1 

where r ; = - i,), s, = ^ — 


0 if j £ i 

/r- if j = i 


J = 1,2,3 


Z=1 


1- C i ~ tl 

1=1 ^ t 
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6 . Generalize the proof and the result of Theorem IV. 3.10 to multi step collocation 
methods. 

Hint. Instead of KM(x ) in (IV.3.26) we have to insert a linear combination 

Y,t=i a £^e( x ) where M e (x) = M(x) • , M(x) = ^ Yl* =1 (x ~ c i ) an d 

a 1 ,..., a k are arbitrary. Instead of (IV.3.27) we then obtain 


k 


“W=E a <E 

1=1 3= 0 




(3.43) 


Putting a; = t 1) t 2 ,..., t k , and = y { gives an overdetermined sys¬ 
tem for a 1 ,..., a k which has a solution only if a certain determinant is zero. 
Setting y 1 = 1 , y 2 = C» 2/3 = C 2 > • • • there leads to the characteristic equation 


det 


Y, a j =o M i ) {h)n 8 - i 


ve;= 0 Mi 0) (4 + iV s - j ' 


c 

EEo M fc j) (*fc+i)^“ j C fc y 


as a generalization of (IV.3.22,23). Tedious expansions of this determinant 
into powers of ( and /i (with many coefficients equal to zero) then leads to an 
explicit expression (see Theorem 7 of Lie 1990). 


7. Prove that the 2-step 2-stage collocation method with c 2 = 1 is A -stable iff 

Ci>(vTT-1)/8. 

Hint, a) Show that the characteristic equation is q 2 (/i)C 2 + Qi (aOC + q 0 (/i) = 0, 
where 

q 2 {V’) = “(9c! + 5) + /x(3c 2 + 7c ± + 2) - y?2c x {c x + 1) 

qM = 12 Ci +4- M 4( C 2-1) (3.44) 

Qoifj) = -3c x + l+MC^Ci - 1). 

b) Apply Schur’s criterion (1918) to the polynomial (3.44) with fi — it , t G M. 

Schur’s criterion. Let a(C) = + a k -i ( k ~ 1 + • • • + a 0 ( a fc ^ 0) be a 

polynomial with complex coefficients and set 

a *(C) — a oC fc + a iC fc 1 + ... + a k . 

Then, all zeros of a(() lie inside the unit circle, iff 

i) \a 0 \ < \a k \ 

ii) the zeros of (~ 1 (a*(0)a(() — a(0)a*(£)), a polynomial of degree k — 1, 
are all inside the unit circle. 


8 . Prove that c x — (\/l7 — 1) /8 is a super-convergence point for the 2-step 2-stage 
collocation methods with c 2 = 1 . 



V.4 Order Stars on Riemann Surfaces 


Riemann ist der Mann der glanzenden Intuition. Durch seine umfassende 
Genialitat iiberragt er alle seine Zeitgenossen ... Im Auftreten schiichtem, 
ja ungeschickt, musste sich der junge Dozent, zu dem wir Nachgeborenen 
wie zu einem Heiligen aufblicken, mancherlei Neckereien von seinen Kol- 
legen gefallen lassen. 

(F. Klein, Entwicklung der Mathematik im 19. Jhd., p. 246, 247) 


We have seen in the foregoing sections that the highest possible order of A -stable 
linear multistep methods is two; furthermore, the second derivative Enright meth¬ 
ods as well as the SDBDF methods were seen to be A-stable for p < 4; the three- 
stage Radau multistep methods were A-stable for p < 6. In this section we shall 
see that these observations are special cases of a general principle, the so-called 
“Daniel-Moore conjecture” which says that the order of an A-stable multistep 
method involving either 5 derivatives or 5 implicit stages satisfies p < 2s. Be¬ 
fore proceeding to its proof, we should become familiar with Riemann surfaces. 


Riemann Surfaces 


Fur manche Untersuchungen, namentlich fur die Untersuchung algebrais- 
cher und Abel’scher Functionen ist es vortheilhaft, die Verzweigungsart 
einer mehrwerthigen Function in folgender Weise geometrisch darzustellen. 
Man denke sich in der Or,y)-Ebene eine andere mit ihr zusammenfall- 
ende Flache (oder auf der Ebene einen unendlich diinnen Korper) ausge- 
breitet, welche sich so weit und nur so weit erstreckt, als die Function 
gegeben ist. Bei Fortsetzung dieser Function wird also diese Flache eben- 
falls weiter ausgedehnt werden. In einem Theile der Ebene, fur welchen 
zwei oder mehrere Fortsetzungen der Function vorhanden sind, wird die 
Flache doppelt oder mehrfach sein; sie wird dort aus zwei oder mehreren 
Blattem bestehen, deren jedes einen Zweig der Function vertritt. Um einen 
Verzweigungspunkt der Function herum wird sich ein Blatt der Flache in 
ein anderes fortsetzen, so dass in der Umgebung eines solchen Punktes die 
Flache als eine Schraubenflache mit einer in diesem Punkte auf der (x,y)- 
Ebene senkrechten Axe und unendlich kleiner Hohe des Schraubenganges 
betrachtet werden kann. Wenn die Function nach mehreren Umlaufen des 
z um den Verzweigungswerth ihren vorigen Werth wieder erhalt (wie z.B. 
(z — a) m / n , wenn m, n relative Primzahlen sind, nach n Umlaufen von z 
um a), muss man dann freilich annehmen, dass sich das oberste Blatt der 
Flache durch die iibrigen hindurch in das unterste fortsetzt. 

Die mehrwerthige Function hat fur jeden Punkt einer solchen ihre Verzwei¬ 
gungsart darstellenden Flache nur einen bestimmten Werth und kann daher 
als eine vollig bestimmte Function des Orts in dieser Flache angesehen wer¬ 
den. (B. Riemann 1857) 
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We take as example the BDF method (III. 1.22”) for k = 2 which has the character¬ 
istic equation 

(|-^- 2 C+i=0. (4.1) 

This quadratic equation expresses £ as a function of ja , both are complex variables. 
It is immediately solved to yield 


c 


1,2 — 


2 ± >/l + 2/j 
3 — 2/j 


(4.2) 


which defines a two-valued function, i.e., to each ^ £ C we have two solutions £. 
These two solutions are displayed in Fig. 4.1 (we have plotted the level curves of 
ICi 2 (aOI ; the region with |£i(aOI > 1 is in white). 



/- — 2 +VT+ 27 T f 2-V1+2/A 

Si “ 3 —2/i. S2 3 —2 /la 

Fig. 4.1. The two solutions of the BDF2 characteristic equation 



Fig. 4.2. Three dimensional view of the map (4.4) 
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We observe two essential facts. First, there is a pole of , but not of ( 2 , at the 
point p = 3/2. This is due to the factor (3/2 — p) in (4.1) which represents the 
implicit stage of the method. Second, we observe a curious discontinuity on the 
negative real axis left of the point — 1 /2, a phenomenon first observed in a famous 
paper of Puiseux (1850) (“... a encore cet inconvenient, que u devient alors une 
fonction discontinue ...”). It has its reason in the complex square root a/1 + 2 p 
which, while 1 + 2^ performs a revolution around the origin, only does half a 
revolution and exchanges the two roots. We cannot therefore speak in a natural 
way of the two complex functions (p) and ( 2 (aO • And here comes the great idea 
of Riemann (1857): Instead of varying p in the complex plane C, we imagine it 
varying in a double sheet of (in Riemann’s words: infinitely close) complex planes 
CU C. The p's in the upper sheet are mapped to , the p's in the lower sheet 
are mapped to ( 2 . The double-valued function then becomes single-valued. At the 
“cut”, left of the point — 1 / 2, the two roots and ( 2 are interchanged, so we must 
imagine that the upper sheet for continues into the lower sheet for ( 2 (shaded 
in Fig. 4.1) and vice-versa. If we denote the manifold obtained in this way by M, 
then the map 



becomes an everywhere continuous and holomorphic map (with the exception of 
the pole). M is then called the Riemann surface of the algebraic function p i-> (. 
A three-dimensional view of the map 



(4.4) 


is represented in Fig. 4.2. 

More General Methods. Most methods of Sect. V.3 are so-called multistep Runge- 
Kutta methods defined by the formulas 

Vn+k = Y a J Vn+j-1 + h Y b J ^ X n + c j h > v ^) 

3 =1 j -1 

u .- B) = Y y n +i- 1 + h Y b a +c J h > ^ 

3 -1 3 -1 

This is the subclass of general linear methods (Example III.8.5) for which the exter¬ 
nal stages represent the solution y(x) on an equidistant grid. The bulk of numerical 
work for applying the above method are the implicit stages (4.5b). 

For the stability analysis we set as now usual f(x,y) — Ay, h\ — p and 
(Vn , Vn+ 1 , • • •, Vn+k) = (1, C, • • • > C*) • Equation (4.5b) then becomes in vector 
notation (using C = (1, C? - - -, C k ~ 1 ) T ) 

v = (/ - M-B ) -1 AC, 


(4.5a) 

(4.5b) 


(4.6) 
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which is rational in p with denominator det (I — pB ). Inserting this into (4.5a) 
and multiplying with this denominator we obtain a characteristic equation of the 
form 

C) = + 2/c-i(/ i ')C /c 1 + • • ■ + q Q (p) = 0 (4.7) 

where q k (p) = det(7 — pB) and all g.(/i) are polynomials in p of degree < s. 

Multiderivative multistep methods , on the other hand, may be written as 
(M. Reimer 1967, R. Jeltsch 1976) 

s k 

Y, hj T, a ii Dj yn + i =o ( 4 . 8 ) 

j = 0 2=0 

where the computation of higher derivatives Diy is done by Eq. (II. 12.3). For the 
equation y' = A y we have Diy = A iy and inserting this into (4.8) together with 
(y n , y n+1 ,..., y n + k ) = (1, C? • • • 5 C k ) we obtain at once a characteristic equation 
of the form (4.7). Here, the degree s of the polynomials Pj(p) is equal to the 
order of the highest derivative taken. The bulk of numerical work for evaluating 
(4.8) is the determination of y n+k from an implicit equation containing y n+k , 
Dy n+k ,..., D s y n+k . If the last of these derivatives is present (i.e., if a ks ^ 0), 
then the degree of q k (p) in (4.7) will be s . 

The Riemann surface M of (4.7) will consist of k sheets, one for each of the k 
roots £ j . The branch points are values of p for which two or several roots of (4.7) 
coalesce to an m-fold root. These are the roots of a certain “discriminant” (see 
any classical book on Algebra, e.g., the famous “Weber”, Vol. I, § 50); hence for 
irreducible Q(pX) there are only a finite number of such points. The movement of 
the coalescing roots £ •, when p surrounds such a branch point, has been carefully 
studied by Puiseux: They usually form what Puiseux calls a “systeme circulaire”, 
i.e., they are cyclically permuted at each revolution like the values of the com¬ 
plex function r yfz near the origin. The Riemann surface must then follow these 
“monodromies” and must be cut along certain lines and rejoined appropriately. The 
location of these cuts is not unique. 



$ 



Fig. 4.3. Different cuts for (4.9) (Hurwitz & Courant 1925) 
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Different possibilities for cutting the Riemann surface of, say, the function 

C 2 -(1-M 4 ) = 0 (4.9) 

with branch points at ±1 and ±z, are shown in a classical figure reproduced from 
the book of Hurwitz & Courant, second edition 1925, p. 360 (Fig. 4.3). 


Poles Representing Numerical Work 

Only 85 miles (geog.) from the Pole, but it’s going to be a stiff 
pull both ways apparently; still we do make progress, which is 
something. (R.F. Scott, January 10, 1912; first men¬ 

tion of interrelation between poles and stiffness in the literature) 


We have just seen that the degree 5 of q k (p) in (4.7) expresses the numerical work 
(either the number of implicit stages or the number of derivatives for the implicit 
solution). Now q k (p) will possess 5 zeros /i 1? /i 2 ,... ,/i a . What happens if p 
approaches one of these zeros? The polynomial (4.7) of degree k (with k roots 
{fi ),..., ( k (/i) ) suddenly becomes a polynomial of degree k — 1 with only k — 1 
roots. Where does the last one go? Well, by Vieta’s Theorem, it must go to infinity. 
In order to compute its asymptotic behaviour, suppose q k (p 0 ) = 0, q k {p 0 ) ^ 0, 
q k -i (Mo) ^ an( * that ( is large. Then all terms q k - 2 {^)C k ~ 2 ^ • • •, <? 0 (/i) are 
dominated by q k _ 1 (p)( k ~ 1 and may be neglected. It results that 


^ gfc-l(/*o) 1 

q'k(»o) 


as 


(4.10) 


hence the algebraic function ((p) possesses a pole on one of its sheets. If q k (p 0 ) = 
0 is a multiple root, the corresponding pole will be multiple too. 

It is also possible that the pole in question coincides with a branch point. This 
happens when in addition to q k (p 0 ) = 0 also q k _ x (p 0 ) = 0. In this case two roots 
£-(/i) tend to infinity, but more slowly , like ±C(p — pf)~ 1 ! 2 (Exercise 1). We 
therefore count both “half-poles” together as one pole again. If c is a boundary 
curve of a neighbourhood V of p 0 (which around this branch point surrounds p 0 
twice before closing up), the argument of £(/i) makes just one clockwise revolution 
on this path. Fig. 4.4 illustrates this fact with an example. 

Recapitulating we may state: 


Lemma 4.1. The Riemann surface for the characteristic equation of a multistep 
Runge-Kutta method with s implicit stages per step (or a multiderivative multi- 
step method with s implicit derivative evaluations) includes at most s poles of the 
algebraic function £(/i). □ 


We shall see below that Lemma 4.1 remains true for the whole class of general 
linear methods, but for the moment we are “impatient et joyeux d’aller au combat” 
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Fig. 4.4. Behaviour of roots of p£ 2 +2ju£ + 2 — p = 0 near the origin p == 0 


(Asterix Legionnaire, pp. 29 and 30). The argument principle also remains valid 
on Riemann surfaces and we state it as follows: 

“On the left, isn’t it ?” — “Right.” 

“On the right ?” — “Left, leeeft!” 

(John Cleese in “Clockwise”) 

Lemma 4.2. Suppose that a domain F C M contains no zeros of ((p) and that 
its boundary consists of closed loops ^,..., . Then the number of poles of ((p) 

contained in F is equal to the total number of clockwise revolutions of arg (((p)) 
along 7 j,..., 7 ^, each passed through in that direction which leaves F to the left 
°fly 

The proof is by cutting F into thousand pieces, each of which is homeomorphic to 
a disc in C, and by adding up all revolution numbers which cancel along the cuts, 
because the adjacent edges are traversed in opposite directions. □ 


Order and Order Stars 

... denn das Klare und leicht FaBliche zieht uns an, das Verwi- 
ckelte schreckt uns ab. (D. Hilbert, Paris 1900) 

Guided by the ideas of Sect. IV.4, we now compare the absolute values of the char¬ 
acteristic roots | Ci | and |C 2 1 for the BDF2 scheme (4.2) with the exponential func¬ 
tion \e^\ = e Re ^, hence we define (Wanner, Hairer & Nprsett 1978) 

^. = { M eC; |C»l>Kl} 


3 = 1,2. 


(4.11) 
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Fig. 4.5. The order star (4.14) for BDF2 


These sets, on precisely the same scale as in Fig. 4.1, are represented in Fig. 4.5. 

The sets A- continue across the cuts in the same way as do the roots, it is 
therefore natural to embed them into the Riemann surface M and define 

A={ M eM; |C(aOI > |e 7r(ft) |} (4.12) 

where tt : M —> C is the natural projection. 

Fig. 4.5 shows clearly an order star with three sectors for Ci(m) > but none for 
C 2 (/i) , and we guess that this has to do with the order of the method, which is two. 
Lemma 4.3 below will extend Lemma IV.4.3 to multistep methods. 

By putting h = 0 in (4.5) (hence p — 0 in (4.7)), and 

(hence ( — 1 in (4.7)), we must have by consistency that y n + k = 1 too, i.e., that 
<3(0,1) = 0. This corresponds to the formula g( 1) = 0 in the multistep case (see 
(III.2.6)). But for h = 0 the difference equation (4.5a) is stable only if ( = 1 is a 
simple root of the polynomial equation <3(0, () = 0. Hence we must have 

Q(0,1) =-0, f (0,1)^0. (4.13) 

The analytic continuation Ci(aO °f this root in the neighbourhood of the origin (as 
far as it is not embarassed with branch points) will be called the principal root , the 
corresponding surface the principal sheet of M . 

Lemma 4.3. For stable multistep Runge-Kutta (or multiderivative) methods of 
order p the set A possesses a star of p+1 sectors on the principal sheet in the 
neighbourhood of the origin. 

Proof We fix A £ C, set y r = A y and take for y 0: ..., y fc _ 1 exact initial values 

1, ..., e( fc_1 )^. The order of the method then tells us that the local error (see 

Fig.III.2.1), i.e., the difference between e k v and the numerical solution y k com¬ 
puted from (4.5a), must be C • h p +1 for h —y 0, hence C A _ ^ -1 /i^ + i for p —> 0. 
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Thus, Formula (4.5) with all y ■ replaced by ew will lead to 

QQu, e*) = Cp?* 1 + G (^ +2 ). (4.14) 

We subtract (4.14) from (4.7), choose for ((/j) the principal root Ci(^) (for which 
— Ci(m) is small for |^u| small) and linearize. This gives 

^((UM^-Ci M) = Cv , p+1 + ... 

and by dividing through by the non-zero constant (4.13) 

= C-^ +1 + 0(^ +2 ) for p-tO. (4.15) 

The rest of the proof now goes exactly analogously to that of Lemma IV.4.3. 
There is also not much difference in the case of multiderivative methods. □ 


The constant C of (4.15) is called the error constant of the method. This 
is consistent with Formula (III.2.6) and (III.2.13) for multistep methods and with 
(IV.3.5) for Runge-Kutta methods. 

The stability domain of multistep Runge-Kutta methods as well as their A- 
stability is defined in the same way as for multistep methods (see Definition 1.1). 
One has only to interpret , CfcO-O as the roots of (4.7). 


The “Daniel and Moore Conjecture” 

It is conjectured here that no A -stable method of the form of 
Eq. 5-6 can be of order greater than 2 J + 2 and that, of those 
A -stable methods of order 2 J + 2, the smallest error constant is 
exhibited by the Hermite method ... 

(Daniel & Moore 1970, p. 80) 

At the time when no simple proof for Dahlquist’s second barrier was known, a 
proof of its generalization, the Daniel & Moore conjecture, seemed quite hope¬ 
less. Y. Genin (1974) constructed A-stable multistep multiderivative methods with 
astonishingly high “order” contradicting the conjecture. R. Jeltsch (1976) later 
cleared up the mystery by showing that Genin’s methods had 1 as multiple root of 
g(() and hence the “effective” order was lower. The conjecture was finally proved 
in 1978 with the help of order stars: 

Theorem 4.4. The highest order of an A -stable s -stage Runge-Kutta (or s- 
derivative) multistep method is 2s. For the A-stable methods of order 2s the 
error constant satisfies 
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Proof. By A-stability, we have for all roots \(j(iy)\ < 1 along the imaginary 
axis; hence the order star A is nowhere allowed to cross the imaginary axis. We 
consider A+ ■= A Pi tt- 1 (C+), the part of the order star which lies above C+ . As in 
Lemma IV.4.4, A+ must be finite on all sheets of M. The boundary of A+ may 
consist of several closed curves. As in Lemma IV.4.5, the argument of ((fi)/ev 
is steadily increasing along 3A+. Since at the origin we have a star with p+1 
sectors (Lemma 4.3), of which at least [+^] lie in C+ , the boundary curves of 
A+ must visit the origin at least l 2 ^] times. Hence the total rotation number is at 
least [+^] and from Lemmas 4.1 and 4.2 we conclude that 


p+1' 
. 2 . 


< s. 


(4.17) 


This implies that p<2s and the first assertion is proved. 

We now need a new idea for the part concerning the error 
constant. The following reasoning will help: the star A ex¬ 
presses the fact that the surface |£(/i)/e**| goes up and down 
around the origin like Montaigne’s ruff. There, the error con¬ 
stant has to do with the height of these waves. So if we want to 
compare different error constants we must compare 
to \R(jj)/e^\, where R(fi) is the characteristic function of a 
second method. By dividing the two expressions, cancels 
and we define 



B : 




cm 




> 1 


}■ 


(4.18) 


called the relative order star. For R(z) we choose the diagonal Pade approxima¬ 
tion R ss {z) with 5 zeros and 5 poles (see (IV.3.30)). By subtracting (IV.3.31) 
(with j — k — s) from (4.15) (where it is now supposed that p = 2s) we obtain 

R.M - C, M = (c - (-1)' M ,y +1) , ) x !,+1 + ■ • ■ • (4-19) 

^ y v 

c 

It is known that \R ss (iy)\ = 1 for all y e R and that all zeros of R ss (z ) lie in C~ 
(Theorem IV.4.12). Therefore the set B in (4.18) cannot cross the imaginary axis 
(as before) and the quotient /R(tt(ij,))\ has no other poles above C+ than 
those of £(/i), of which, we know, there are at most s. Therefore the sectors of 
the relative order star B must exhibit the same colours as those of the classical 
order star A for diagonal Pade (see Fig. IV.4.2). Otherwise an extra pole would be 
needed. We conclude that the error constants must have the same sign (see Lemma 
IV.4.3), hence (see IV.3.31) (—1 ) S C > 0, which leads to (4.16). 

Equality C — 0 would produce an order star B of even higher order which is 
impossible with 5 poles, unless the two methods are identical. □ 
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Remarks, a) The first half is in fact superfluous, since the inequality (4.16) implies 
that the 2s -th order error constant C / 0, hence necessarily p < 2s. It has been 
retained for its beauty and simplicity, and for readers who do not want to study the 
second half. 

b) The proof never uses the full hypothesis of A-stability; the only property 
used is stability on the imaginary axis (/-stability, see (IV.3.6)). Thus Theorem 4.4 
allows the following sharpening, which then extends Theorem IV.4.7 to multistep 
methods: 


Theorem 4.5. Suppose that an I -stable s -stage Runge-Kutta (or s -derivative) 
multistep method possesses a characteristic function ((p) with s x poles in C+. 
Then 


P < 2 s x 


(4.20) 


and the error constant for all such I-stable methods of order p = 2s 1 satisfies 


(—1 ) S1 C > 


5 1 !5 1 


(2-Sj)! (25j +1)! 


(4.21) 

□ 


Another interpretation of this theorem is the following result (compare with 
Theorem IV.4.8), which in the case 5 = 1 is due to R. Jeltsch (1978). 

Thorem 4.6. Suppose that an I -stable method with s poles satisfies p > 2s — 1. 
Then it is A -stable. 


Proof. If only 5 — 1 poles were in C+ , we would have p < 2s - 2 , a contradiction. 
Hence all poles of ((p) are in C+ and A-stability follows from the maximum 
principle. □ 


Methods with Property C 

It is now tempting to extend the proof of Theorem 4.4 to any method other than the 
diagonal Pade method. But this meets with an essential difficulty in defining (4.18) 
if R(p) is a multistep method defined on another Riemann surface, since then the 
definition of B makes no sense. The following observation will help: The second 
part of the proof of Theorem 4.4 only took place in C+ , which was the instability 
domain of the “comparison method”. This leads to 

Definition 4.7 (Jeltsch & Nevanlinna 1982). Let a method be given with charac¬ 
teristic polynomial (4.7) satisfying (4.13) and denote its stability domain by S R . 
We say that this method has Property C if the principal sheet includes no branch 
points outside of tt _1 (S r ) (with oo included if S R is bounded), and the principal 
root R±(p) produces the whole instability of the method, i.e., 

A* :=9S s = {^C; 1^)1 = 1}. 


(4.22) 
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Examples. All one-step methods have Property C , of course. Linear multistep 
methods whose root locus curve is simply closed have Property C too. In this situ¬ 
ation all roots except R 1 (p) have modulus smaller than one for all p $ 7r _1 (S^) • 
Thus the principal sheet cannot have a branch point there. The explicit 4th or¬ 
der Adams method analyzed in Fig. 1.1 does not have Property C . The implicit 
Adams methods (see Fig. 1.3) have Property C for k < 5. Also, the 4th order 
implicit Milne-Simpson method (1.16) has property C . 

Definition 4.7 allows us to replace R ss (p) in the proof of Theorem 4.4 by 
R 1 (p), C+ by the exterior of S R , the imaginary axis by A R and to obtain the 
following theorem (Jeltsch and Nevanlinna the 5th of April, 1979 at 5 a.m. in 
Champaign; G.W. the 5th of April, 1979 at 4.30 a.m. in Urbana. How was this 
coincidence possible? E-mail was not yet in general use at that time; was it Psi- 
mail?) 


Theorem 4.8. Let a method with characteristic function R(p), stability domain 
S R and order p R possess Property C . If another method with characteristic func¬ 
tion ((p), stability domain S^ and order p^ is more stable than R, i.e., if 


D S R, 

(4.23) 

then 


p <2s 

(4.24) 

where 


p = min(p R , p ( ) 

(4.25) 

and s is the number of poles of C(aO> each counted with its multiplicity ; which are 

not poles of the principal root R 1 (p) of R(p ). 

□ 


... and tried to optimize the stability boundary. Despite many 
efforts we were not able to exceed a/ 3, the stability boundary of 
the Milne-Simpson method ... (K. Dekker 1981) 

As an illustration of Theorem 4.8 we ask for the largest stability interval on the 
imaginary axis I r = [—ir, ir] cC of a 3rd order multistep method (for hyperbolic 
equations). Since we have 5 — 1 for linear multistep methods, p = 3 contradicts 
(4.24) and we obtain from Theorem 4.8 by using for R(p) the Milne-Simpson 
method (1.16): 

Theorem 4.9 (Dekker 1981, Jeltsch & Nevanlinna 1982). If a linear multistep 
method of order p>3 is stable on I r , then r < \/3- □ 


The second part of Theorem 4.4 also allows an extension, the essential ingre¬ 
dient for its proof has been the sign of the error constant for the diagonal Pade 
approximation. 
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Theorem 4.10. Consider a method with characteristic equation (4.7) satisfying 
(4.13) and let p denote its order and C its error constant. Suppose 

a) the method possesses Property C, 

b) the principal root Ri(p) possesses s poles, 

c) sign (C) = (—l) s 

d) p>2s — l. 

Then this method is “optimal” in the sense that every other method with s poles 
which is stable on A R of (4.22) has either lower order or, for the same order, a 
larger (in absolute value) error constant. □ 


Examples. The diagonal and first sub-diagonal Pade approximations satisfy the 
above hypotheses (see Eq. (IV.3.30)). Also /-stable linear multistep methods with 
Property C can be applied. 

Remark 4.11. Property C allows the extension of Theorem IV.4.17 of Jeltsch & 
Nevanlinna to explicit multistep methods. Thus explicit methods with comparable 
numerical work cannot have including stability domains. Exercise 4 below shows 
that Property C is a necessary condition. Remember that explicit methods have all 
their poles at infinity. 


General Linear Methods 


The large class of general linear methods (Example III.8.5) written in obvious ma¬ 
trix notation 


V n = Au n + hB f( V n) (4.26a) 

U n+1 = + hBf(v n ) (4.26b) 

seems to allow much more freedom to break the Daniel & Moore conjecture. This 
is not the case as we shall see in the sequel. 

The bulk of numerical work for solving (4.26) is represented by the implicit 
stages (4.26a) and hence depends on the structure of the matrix B. Inserting y f — 
A y leads to 

«„+i = 5 (a0«„ (4.27) 

where 

S(h)=A + ijB(I-ijB)- 1 A. (4.28) 

The stability of the numerical method (4.27) is thus governed by the eigenvalues 
of the matrix S(p). The elements of this matrix are seen to be rational functions 
in p. 
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Lemma 4.12. If the characteristic polynomial of S(p) is multiplied by det(J — 
pB) then it becomes polynomial in p: 

det (Cl - S(n)) • det(7 - pB) = q k (v)( k + q k ^ MC*' 1 + • • ■ + q 0 (n) 

= ■ Q{», 0 (4.29) 

where g 0 ,..., q k are polynomials of degree < s and q k (p) = det (/ — pB). 

Proof Suppose first that B is diagonalizable as 

T~ l BT = diag ,..., /? 3 ) (4.30) 

so that from (4.28) 


S{n) — A + BT diag(w 1 ,... ,w s )T 1 A = A + ^ w i A' p 

i —1 


where 


P 


d t — z-th column of BT 
cf = z-th row of T~ l A. 




(4.31) 


(4.32) 


We write the matrix (I — S(p) in terms of its column vectors 

e 1 -a 1 - W 1 c 11 d 1 — WZ 2 C 12^2 C e 2 ~~ a 2 ~ W l C 21^1 ~ W 2 C 22^2 • 5 * • • 

Its determinant, the characteristic polynomial of S(p ), is computed using the mul¬ 
tilinearity of det and considering as scalars. All terms containing one 

of the w J to any power higher than 1 cancel, because the corresponding factor is 
a determinant with two or more identical columns. Thus, if det ((I — S(p)) is 
multiplied by fj* =1 (1 ~ pPi) ~ det (I — pB) it becomes a polynomial of the form 
(4.29). 

A non-diagonalizable matrix B is considered as the limit of diagonalizable 
matrices. The coefficients of the polynomial Q(p,() depend continuously on B . 

□ 


We conclude that Lemma 4.1 again remains valid for general linear methods. 
The 5 poles on the Riemann surface for the algebraic function Q(pA ) — 0 are 
located at the positions p = 1 / fi 1 ,..., p = 1 / f3 s where f3 i are the eigenvalues of 
the matrix B . 

We next have to investigate the order conditions , i.e., the analogue of Lem¬ 
ma 4.3. Recall that general linear methods must be equipped with a starting pro¬ 
cedure (see Eq. (III.8.4a)) which for the differential equation y' — A y will be of 
the form u 0 = f(p) • y 0 with -0(0) f 0. Here p — hX and f(p) is a A:-vector of 
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polynomials or rational functions of p. Then the diagram of Fig. III.8.1 becomes 
the one sketched in Fig. 4.6. 

The order condition (see Formula (III.8.16) of Lemma III.8.11) then gives: 



Lemma 4.13. If the general linear method (4.26) is of order p then 

(e^ / — 5(/i))^(/i) = 0(n P ) for (i —>0 (4.33a) 

E^I — S(/i))^(/i) = 0(fi p ^ 1 ) for p ^ 0 (4.33b) 

where E is defined in (III.8.12) and S(p) is given in (4.28). □ 


Formula (4.33) tells us, roughly, that 'f(p) is an approximate eigenvector of 
S(p) with eigenvalue . We shall now see how this information can be turned 
into order conditions for the correct eigenvalues of S(p). 

Definition 4.14. Let £ be the number of principal sheets of (4.29), i.e., the multi¬ 
plicity of 1 as eigenvalue of 5(0) (which, by stability, must then be a simple root 
of the minimal polynomial). I is also the dimension of I in (III.8.12) and the rank 
of E. 

Theorem 4.15. Suppose that there exists 'fi(p) with -0(0) ^ 0 such that the gen¬ 
eral linear method satisfies the conditions (4.33) for order p > 1. Then the I -fold 
eigenvalue 1 of S continues into i eigenvalues (j(p) of S(p) which satisfy 


CjM = 0(n Pj+1 ) 0 

(4.34) 

l 

P 3 >0, EPj^P- 

(4.35) 


j =i 


/ 1 + /X f M 2 \ 

V3 M +ii M 2 ) 


Examples, a) The matrix 
S{fi) = 


(4.36) 
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has t = 2 so that E = I in (4.33b). There is a vector (non-vanishing for 
/i = 0) such that 

(^/-s(mMm) = o(m 6 ), 

i.e., p — 5. The eigenvalues 


satisfy 

P 2 = 0. 


Cl ,2 (/^ ) — 

e >l -CiM = 0{n 6 ), 


17M 13m 2 A , 20m [. M , 9m 2 

T + ~Q~) ± -r\l 1 ~2 + w 

ev — £ 2 (/i) — 0(/i) , which is (4.34) with = 5, 


b) The matrix 


S(m) = 


1 + 2 m + 
M 



(4.37) 


satisfies (4.33) with £ — 2, p = 4. Its eigenvalues Ci 2 (a0 — 1 + T + T 2 /% 
(4.34) with p 1 = p 2 =2. 
c) The example 

1 + 2^ — p + \ 

p 1 ; 



fulfil 


(4.38) 


has ^ = 2, p = 1 in (4.33). Its eigenvalues Ci ,2 (m) = 1 + /i ± satisfy (4.34) 
with p 1 =p 2 = l/2. This example shows that the p ■ in (4.34) need not be integers. 


Proof of Theorem 4.15. We introduce the matrix 

S(p) = e^I — S(p) (4.39) 

which has the same eigenvectors as S(p) and the corresponding eigenvalues 


C i (M) = e'*-C i (M). (4-40) 

Formulas (4.34) and (4.35) now say simply that 

t 

H Cj(m) - 0 {^ +t ) m -> 0. (4.41) 

j= 1 


Since the product of the eigenvalues is, as we know, the determinant of the matrix, 
we look for information about det S(p). 

After a suitable change of coordinates (via the transformation matrix T of 
(III.8.12)) we suppose the matrix S = S( 0) in Jordan canonical form. We then 
separate blocks of dimensions £ and k — i so that 


E = 


(I 0A /J4-C?(m) O(m) 

(0 oj’ (M) V 0(m) 0(1) 



5(m) = 


■S'i^m) 
^21 (a < ) 


^12 (m) 

^22 (aO 


O(m) O(m) 
O(m) 0(1) 


(4.43) 
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where it is important to notice that $22 (0) is invertible; this is because E collects 
all eigenvalues equal to 1 , thus S 22 (0) has no eigenvalues equal to 1 and S 22 (0) 
has none equal to zero. Conditions (4.33) now read 

(SM sm\ M(m)\ = (0(^)\ 

\sm sm) W 2 M / V o(^) )' 

Putting p = 0 in (4.44) we get ^ 2 (0) = 0. The assumption ^(0) ^ 0 thus implies 
that at least one component of ^(0), say the j-th component ^(0), does not 
vanish. Cramer’s rule then yields 

det S(p) • f 1 j (/i) = det T(/i), (4.45) 

where T(p) is obtained from S(p) by replacing its j- th column by the right- 
hand side of (4.44). One easily sees that det T(p) = (D(pP+ e ) (take out a factor p 
from each of the first t lines^and a factor pP from the j -th column). Because of 
ip 1 j( 0) 7 ^ 0 this implies det S(p) = 0(pP+ £ ) . We have thus proved (4.41) (hence 

(4.34) and (4.35)), because <^ +1 ,..., ( k do not converge to zero for p -» 0. □ 


The next lemma excludes fractional orders for A-stable methods: 

Lemma 4.16. For I-stable general linear methods the orders p- in (4.34) must be 
integers. 

Proof. Divide (4.34) by eP , let 

= 1 _ Cn mlr +... (4.46) 

g/i 

where p J ; + 1 = m/r , and suppose that r > 1 and m, r are relatively prime. Since 
eP — (j(jLi) are the eigenvalues of the matrix (4.39), hence the roots of an ana¬ 
lytic equation, the presence of a root fi rn / r involves the occurrence of all branches 
jpm jr . e 2mj/r ( j = 0,1 ,..., r — 1). For fi — ±iy = e ±l7T / 2 y (y Gt small), inserted 
into (4.46), we thus obtain 2 r different values 

1 - Cy rn l r e ±trn7V l 2r e 2in ^ r + ... j = 0,1,. .. r - 1 

which form a regular 2r-Mercedes star; hence whatever the argument of C is, 
there are values of C(3ziy) m / r e 2t7T j/ r (for some 0 <j <r — 1) with negative real 
part, such that from (4.46) \(j(±iy)\ > 1. This is a contradiction to I -stability. □ 


And here is the “Daniel-Moore conjecture” for general linear methods: 

Theorem 4.17. Let the characteristic function Q(f^X) of an I-stable general 
linear method possess s poles in C+ . Then 


p < 2s. 


(4.47) 



V.4 Order Stars on Riemann Surfaces 


295 


Proof. Again we denote by A+ = A n 7r -1 (C + ), the part of the order star lying 
above C + . By /-stability A+ does not intersect the imaginary axis 7r -1 (zR) on 
any sheet. 

By Theorem 4.15 the boundary curves 7 m of A+ visit the origin on the differ¬ 
ent principal sheets at least times (j = 1,...,/) (see (4.17)), where the p- 

are integers by Lemma 4.16. Thus by Lemma 4.2 

3 = 1 

Multiplying this by 2, using p- < 2[^^] and (4.35), we get p < 2s . □ 


Dual Order Stars 

Why not interchange the role of the two variables ( and (j ... ? 

(J. Butcher, 

June 27, 1989, in West Park Hall, Dundee, at midsummemight) 

A-stability implies that for all solutions (j(/x) of Q(/i, () = 0 we have 

Re/i < 0 |C»|<1. (4.49) 

This is logically equivalent to: For all solutions Hj{() of () = 0 we have 

ICI > 1 Re/i/O^O (4.50) 

(in fact, pure logic gives us 44 > ” on both sides; the 44 > ” then follow by continuity). 
Further the order condition (4.15) becomes, by passing to inverse functions for the 
principal root, 

log C - Ah (0 = -C(( - 1) ?+1 + •.. - (4.51) 

Thus order star theory can be very much dualized by the replacements 

a) fx 

b) 0 

c) Imag. axis 

d) Re 

e) Im 

f) exp 
The analogue of the star defined in (4.12) becomes 

A={c, Re M (0< Re (log ()} = {(; ReMC)<log|Cl} (4-53) 

and the analogue of the relative order star (4.18) becomes 

B = {c; ReMC)<ReMC)}- 


c 

1 

Unit circle 


Arg 

log 


(4.52) 


(4.54) 




BJDF2 (zoom) 


BDF3 (zoom) 


Fig. 4.7. Dual order stars (4.53) for BDF methods 


For the special case of the trapezoidal rule this is 

B={C; Re M (C)<Re(2^)}. (4.55) 

The set A is displayed in Fig. 4.7 for the BDF2 and BDF3 methods. It explains 
once again why A -stable methods of order > 2s are not possible (see Exercise 5). 

Still another possibility is to replace (4.50) by the obviously equivalent condi¬ 
tion 

ICI > 1 =* ■ Re j-r > 0 (4.56) 


in which case order condition (4.51) becomes 


1 1 

logC MO 


oc-ro... 


(4.57) 








V.4 Order Stars on Riemann Surfaces 


297 


since log ( as well as p 1 (() are (£ — 1) + — l) 2 ). The order stars now be¬ 

come analogously 


and 


B = < c ; Re 


1 

V(C) “ 

R 'log<} 

(4.58) 

1 

m(C) “ 

R %k(C)}' 

(4.59) 


A special advantage of these last definitions is that for linear multistep methods 
1 //i = a(()/g((), hence the poles of the functions involved are the zeros of g(Q , 
which play a role in the definition of ordinary stability (Sect. III.3). This can be 
used to obtain a geometric proof of the first Dahlquist barrier (Theorem III.3.5), 
inspired by the paper Iserles & Nprsett (1984) (see Exercise 6). 

Also, the proof for Dahlquist’s second barrier of Sect. V.l (Theorem 1.4) can 
be seen to be nothing else but a study of B of (4.59) where p> R (() represents the 
trapezoidal rule. 


Exercises 

1. Analyze the behaviour of the characteristic roots of (4.7) in the neighbourhood 
of a pole which coincides with a branch point, i.e., solve (4.7) asymptotically 
for C large in the case 

L Pk(l I o)~^^ ^(/^o) 7^ O’ Vk- 1 (^ 0 ) 7^ 0- 

Show that these roots behave like ±C(p — /i 0 ) -1 / 2 . 

2. Compute the approximate eigenvectors 0(/i) such that 

{e»I-S(v))Mp) = 0(^ +1 ) 

for the matrices S(/i) given in (4.36), (4.37), (4.38). Show that the stated 
orders are optimal. 

3. Explain with the help of order stars, why the 2-step 2-stage collocation method 
with c 2 = 1 (see Exercise 7 of Sect. V.3) looses A-stability exactly when c 1 
crosses the superconvergence point (Exercise 8 of Sect. V.3). 

4. Modify the coefficient (3 in the method 

y n+ 1 = y n + h (/„ + ^ v/ n + ^ v 2 / n + /?v 3 /„), 

which for (3 = 3/8 is the Adams method of order 4, in such a way that the 
stability domain becomes strictly larger. This example shows that the multistep 
version of Theorem IV.4.17 of Jeltsch & Nevanlinna requires the hypothesis of 
“Property C”. 




A Uooifjj g 


Fig. 4.8. Dual order stars (4.58) and (4.59) for 

<?a(0 = (C-1)(C + 1) 5 . q(0 = C 6 - i. 

<T fl (0 = (251C 6 +2736C 5 +6957C 4 + 10352C 3 +6957C 2 +2736C+ 251)/945 
<t(0 = (41C 6 +216C 5 +27C 4 +272C 3 +27C 2 +216C+41)/140 
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5. Prove the Daniel & Moore conjecture with the help of the order star A from 
(4.53). 

Hint. The set A is not allowed to cross the unit circle and along the borderlines 
of A the imaginary part of log ( — must steadily decrease (consult (4.52) 
and the proof of Lemma IV.4.5). Hence a borderline starting and ending at the 
origin must either pass through a pole (which is not outside the unit circle) or 
cross the negative real axis in the upward direction (where Im (log () increases 
by 27r). Since then the set A must be to the left, this is only possible once on 
each sheet. 
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6. Prove the first Dahlquist barrier by order stars, i.e., prove that stable linear 
multistep methods satisfy p < k + 2 (k even) and p < k + 1 (k odd). Prove 
also that for methods with optimal order the smallest error constant is assumed 
by the method with 

e B (0 = (C-l)(C + l)*“ 1 - (4.60) 

where k = k (if k is even) and k = k — 1 (if k is odd). 

Hint. Study the order stars (4.58) (with pi = p R ) and (4.59) where p R = < t r / 
with from (4.60) (see Fig. 4.8 for the case k = 6, p = 8, £>(C) = £ 6 — 1). 
You must show that the two order stars in the vicinity of ( = 1 have the same 
colours. The following observations will help: 

i) The stars in the vicinity of ( = -1 (produced by the pole l/(( + l) fc-1 ) 
have opposite colours; 

ii) By stability all poles of 

^ (C) = Re G^cTE5c)’ J ° ,0 = R *(^(o-^o) 

lie on or inside the unit circle; 

iii) The boundary curves of A and B cannot cross the unit circle arbitrarily 
often, since d A (e i( ^) and d B (e i< ^ > ) are trigonometric polynomials. 

iv) Study the behaviour of A and B at infinity. 

7. Prove the second Dahlquist barrier for linear multistep methods with the help 
of the order star (4.55). 

8. Compute on a computer for an implicit multistep method of order 3 the or¬ 
der star B of (4.18), where R(p) is the maximal root of the Milne-Simpson 
method (1.17). Understand at once the validity of Theorem 4.9. 
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... we know that theory is unable to predict much of what hap¬ 
pens in practice at present and software writers need to discover 
the way ahead by numerical experiment ... 

(J.R. Cash, in Aiken 1985) 

A comparison of different codes is a notoriously difficult and in¬ 
exact area ... but there are some clear conclusions that can ... 

(J.R. Cash 1983) 

This section presents numerical results of multistep codes on precisely the same 
problems as in Sect. IV. 10. These are, in increasing dimension, VDPOL (the van der 
Pol equation (IV. 10.1)), ROBER (the famous Robertson problem (IV. 10.2)), OREGO 
(the Oregonator (IV. 10.3)), HIRES (the physiological problem (IV. 10.4)), E5 (the 
badly scaled chemical reaction (IV. 10.5)), PLATE ((IV. 10.6), a car moving on a 
plate, the only linear and non autonomous problem), BEAM (the nonlinear elastic 
beam equation (IV.1.10’) with n = 40), CUSP (the cusp catastrophe (IV.10.8)), 
BRUSS (the brusselator (IV. 1.6’) with one-dimensional diffusion a = 1/50 and 
n = 500), and KS (the one-dimensional Kuramoto-Sivashinsky equation (IV. 10.11) 
with n = 1022). We have not included here the problems BECKDO and BRUSS-2D, 
since they require a special treatment of the linear algebra routines. 

As in Sect. IV. 10, the codes have been applied with tolerances 

Rtol = 10 -2-m/4 to = 0,1,2,... 

and Atol = Rtol (with the exceptions Atol — 10 _6 • Rtol for OREGO and ROBER, 
Atol = 10“ 4 • Rtol for HIRES, Atol = 10" 3 • Rtol for PLATE, and Atol = 1.7 • 10~ 24 
for E5). The numerical precisions obtained compared to the CPU times (where 
all codes are compiled with the same optimization options) are then displayed in 
Figs. 5.1 and 5.2, again with the symbols representing the required precision Rtol = 
10~ 5 displayed in grey tone. 


The Codes Used 

LSODE — is the “Livermore Solver” of Hindmarsh (1980, 1983). Since we are 
dealing with stiff equations, we use “stiff” method flags MF= 21,22,24 or 25, 
so that the code is based on the Nordsieck representation of the fixed step size 
BDF methods (see Sections III.6 and III.7). This code emerged from a long de¬ 
velopment starting with Gear’s DIFSUB in 1971. Its exemplary user interface and 
ease of application has been a model for much subsequent ODE Software (includ¬ 
ing ours). Most problems were computed with analytical Jacobian and full linear 
algebra (MF= 21), with the exception of BRUSS and KS (analytical banded Jaco¬ 
bian, MF= 24), BEAM (numerical full Jacobian, MF= 22), and CUSP (numerical 
banded Jacobian, MF= 25). 
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VDPOL 




10- 3 io H 


OREGO 




10 ° 10~ 3 10- 6 10 -' 


10" 3 10" 6 10- 9 1 




error 10- 1 



Fig. 5.1. Work-precision diagrams for problems of dimension 2 to 80 
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Fig. 5.2. Work-precision diagrams for problems of dimension 80 to 1022 


For E5, the code worked correctly only for Tol < 10 -5 , for PLATE it was nec¬ 
essary to have Tol < 10 -7 . For the BEAM problem, which has eigenvalues on the 
imaginary axis, it was necessary to restrict the maximal order to 2 because of the 
lack of A-stability of the higher order BDF methods. The disastrous effect of the 
allowance of orders higher than 2 can be seen in Fig. 5.3. 

DEBDF — this is Shampine & Watts’s driver for a modification of the code LSODE 
and is included in the “DEPAC” family (Shampine & Watts 1979). As is to be 
expected, it behaves nearly identically to LSODE (see Figs. 5.1 and 5.2). It also 
requires a restriction of the order for the BEAM problem (see Fig. 5.3). 

VODE — is the “Variable-coefficient Ordinary Differential Equation solver” of 
Brown, Byrne & Hindmarsh (1989). It is based on the EPISODE and EPISODEB 
packages (see Sect. III.7) which use BDF methods on a non uniform grid (Byrne 



V.5 Experiments with Multistep Codes 


303 



Fig. 5.3. Performance of LSODE, DEBDF and VODE on the BEAM problem 
with restricted maximal order 



Fig. 5.4. Performance of SECDER, compared to LSODE and VODE 



Fig. 5.5. Performance of MEBDF, compared to LSODE and VODE 
(for the BEAM problem with restricted maximal order) 


& Hindmarsh 1975). The user interface is very similar to that of LSODE; the code 
again allows selection between full or banded linear algebra and between analytical 
or numerical Jacobian. The numerical results of VODE (see Figs. 5.1 and 5.2) are 
very similar for the large problems to those of LSODE and DEBDF, the code is, 
however, considerably slower on the small problems. For problem E5 this code 
required a tolerance requirement (Rtol < 10“ 5 ). On the PLATE problem, this code 
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was by far the best. On the BEAM problem, one has to restrict the maximal order 
to two (Fig. 5.3). 

SPRINT — this package, written by M. Berzins (see Berzins & Furzeland 1985), 
which has been incorporated into the NAG library (“subchapter D02N”), contains 
several modules for the step integrator, one of which is SBLEND. This allows us to 
study the effect of the blended multistep methods (3.15) of Skeel & Kong (1977). It 
can be seen from Table 3.4 that these methods are A-stable for orders up to 4. We 
therefore expect them to be much better on the oscillatory BEAM problem. As can 
be observed in Fig. 5.2 (as well as in Fig. IV. 10.8), this code gives excellent results 
for this problem. An observation of the grey points for Tol = 10~ 5 (Figs. 5.1 and 
5.2) shows that the code gives better values than the other multistep codes for a 
same given tolerance. From time to time, it is fairly slow (e.g., in the PLATE and 
KS problems). 

SECDER — this code, written in 1979 by C.A. Addison (see Addison 1979), im¬ 
plements the SECond DERivative multistep methods (3.7) of Enright. The high 
order of the methods accompanied with good stability leads us to expect good per¬ 
formance at high tolerances. This has shown to be true (see Fig. 5.4) for OREGO, 
HIRES and PLATE; however, for the latter it is very slow. We have not used it on 
the large problems since it has no built-in banded algebra and requires an analytic 
Jacobian. 

MEBDF — this code by Cash & Considine (1992) implements the modified ex¬ 
tended BDF methods (see Eq. (3.17.mod) and Table 3.5). Its good performance is 
shown on selected examples in Fig. 5.5. For the BEAM problem, the code works 
well if the maximal order is limited to 4. 

LADAMS — this is the “Livermore Adams” code, i.e., LSODE with method flag 
MF = 10, included to demonstrate the performance of an explicit multistep method 
on large and/or mildly stiff problems. One can see that it has its chance on several 
large problems (PLATE, BEAM). It is, when compared to DOPRI5 in Fig. IV. 10.8, a 
good deal slower when /-evaluations are cheap (CUSP), but not on BEAM. 

The codes LSODE, DEBDF, VODE and MEBDF can be obtained by sending an 
electronic mail (e.g., “send lsode.f from odepack”) to “netlib@research.att.com”. 


Exercises 

1. Do your own experiments and draw your own conclusions for the above prob¬ 
lems. The authors will be happy to provide you with drivers and function 
subroutines. 
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... the error analysis is simpler to formulate for one-leg methods 
than for linear multi step methods. (G. Dahlquist 1975) 


The first stability results for nonlinear differential equations and multistep meth¬ 
ods are fairly old (Liniger 1956, Dahlquist 1963), older than similar studies for 
Runge-Kutta methods. The great break-through occured in 1975 (at the Dundee 
conference) when Dahlquist proposed considering nonlinear problems 

y' = f(x,y) (6.1) 

which satisfy a one-sided Lipschitz condition 

(f{x, y) - f(x, z), y-z) < v\\y - z\\ 2 (6.2) 

or, if the functions are complex-valued, 

Re {f(x, y) - f(x, z), y-z)< v\\y - z\\ 2 (6.2’) 

(see Sect. IV. 12). He also found that the study of nonlinear stability for general 
multistep methods is simplified, if a related class of methods — the so-called one- 
leg (multistep) methods — is considered. 


One-Leg (Multistep) Methods 

... the somewhat crazy name one-leg methods ... 

(G. Dahlquist 1983) 

Je ne suis absolument pas capable de traduire “one-leg” en fran- 
$ais ... uni-jambiste? (M. Crouzeix, in 1987) 

Signor mio, le gru non hanno se non una coscia ed una gamba 

(Boc¬ 
caccio, Decameron 1353; quotation suggested by M. Crouzeix) 

Suppose that a linear k -step method 

k k 

a iym+i = h J2 y m +i ) ( 6 - 3 ) 

is given, and that the generating polynomials 

k k 

e(0 = ^«iC l » o-(0 = X^AC* 


(6.4) 



306 V. Multistep Methods for Stiff Problems 


have real coefficients and no common divisor (see Sect. III.2). We also assume 
throughout the normalization 

a(l) = 1. (6.5) 

Then the associated one-leg method is defined by 

k / k k \ 

£ a iy™+i = >*/(£ Pi X m+i > £ fcVm+i ) • ( 6 - 6 ) 

In this new method, the derivative / is evaluated at one point only, which makes it 
easier to analyze. 

It is, of course, interesting to know how the solutions of the one-leg method 
(6.6) are related to those of its “multistep twin” (6.3). If the differential equation 
is linear and autonomous, y' — Ay , then both formulas — (6.3) and (6.6) — are 
identical. For the BDF schemes (1.18) there is in any case only one /-value in the 
multistep-version, hence the equations (6.3) and (6.6) are the same. For general 
methods and general nonlinear equations, however, the formulas are not identical, 
but the solutions are related by certain transformations (see Exercise 3). We con¬ 
sider, as an example, the trapezoidal rule, which is a two-leg method, 

2/m+l 2/m 2 2/m) "1" f (^m+l ’ 2/m+l ’ (6-7) 

The corresponding one-leg method is the implicit midpoint rule, 

2/m+l - y m = h f {—- 2 ~^-) • ( 6 - 8 ) 

If {y m } is a solution of the one-leg formula (6.8), then 

2/m 2 (2/m 2/m + l) ’ X m ^ ^m+l) 

satisfies (6.7). On the other hand, if {y m , x m } satisfy (6.7), then 

y m y m 2^( Xm ’ 22m)’ *^m % m ^ 

is a solution of (6.8). This relationship has already been extensively exploited in 
the proof of Theorem IV. 15.8. 


Existence and Uniqueness 

We suppose a k ^ 0 (as always) and /3 k ^ 0 (otherwise the method is explicit). In 
the case of multistep methods, we write (6.3) in the form 

y-r}-h—f{x,y) = 0, (6.9) 

a k 

where x is given, rj is a vector composed of known quantities, and y = y m+k is the 
unknown vector. The one-leg Formula (6.6) can also be brought to the form (6.9) 



V.6 One-Leg Methods and G -stability 


307 


by the transformation y = + ... + /? 0 y m , so that all subsequent results 

on existence and uniqueness will be valid for multistep and one-leg methods. To 
obtain existence results for Eq. (6.9), we replace hf3 k /a k by a new “step size” h 
and obtain nothing else but implicit Euler. All theorems for implicit Runge-Kutta 
methods (Theorems 14.2, 14.3, and 14.4 of Sect. IV. 14) are immediately applicable 
and give 

Theorem 6.1 (Dahlquist 1975). Let f be continuously differentiable and satisfy 
(6.2). If 

hv<°f (6.10) 

Pk 

then the nonlinear equation (6.9) has a unique solution y. □ 


Theorem 6.2. Let y be given by (6.9) and consider a perturbed value y satisfying 

( 6 . 11 ) 


y-rj-h — f(x,y) = S. 


Under the assumption (6.10) we then have 


' — vll < 


1 


1 ~(Pk! a k) hv 


( 6 . 12 ) 

□ 


Remark. Theorems IV. 14.2, IV. 14.3 and IV. 14.4 are for much more general meth¬ 
ods than just the implicit Euler needed here. The reader who is not interested in 
the more general case can rewrite the proofs of Sect. IV. 14 nearly word for word. 
Since there is now only one implicit stage, all tensor products disappear and the 
formulas, but not the ideas of the proof, simplify considerably. 


G -Stability 


If the differential equation satisfies the one-sided Lipschitz condition (6.2) (or 
(6.2’)) with v — 0, then the exact solutions are contractive (Lemma IV. 12.1). We 
shall investigate here, which one-leg (multistep) methods then also have contrac¬ 
tive solutions. Since the numerical value depends on all y m+fc _ 1 ,..., y m , 

it makes no sense to require \\y m+k ~y m+k \\ < ||y m +*_i - y m +k-i II as in the 
one-step case (Definition IV. 12.2). We have to consider the method as a mapping 
R n ' k R n ' k . For this we introduce the notation 


^rn —1’’ 


and consider inner product norms on 




2 

G 


k k 

EE 9ij {Vm+i — 1 > Vm+j — 1 )> 

i=l j= 1 


(6.13) 


(6.14) 



308 V. Multistep Methods for Stiff Problems 


where (-, •) is the inner product on R n used in ( 6 . 2 ) and the k -dimensional matrix 

^ (#zj)z,j=l,...,fc 

is assumed to be real, symmetric and positive definite. 

Definition 6.3 (Dahlquist 1975). The one-leg method ( 6 . 6 ) is called G-stable, if 
there exists a real, symmetric and positive definite matrix G, such that for two 
numerical solutions {y m } and {y m } we have 

||r m+ 1 -? m+ 1 || G <||y m -f m || G (6.i5) 

for all step sizes h > 0 and for all differential equations satisfying ( 6 . 2 ) or ( 6 . 2 ’) 
with v — 0 . 

Since y' — Ay, Re A < 0 satisfies ( 6 . 2 ’) with v — 0, we immediately get 
Theorem 6.4. G -stability implies A -stability. □ 


Example 6.5. Consider the 2-step BDF method 
3 1 

2 9m-\-2 2 ^zn+2 ’ 2) ‘ (6.16) 

We take a second numerical solution {y m } and denote its difference to { y m } by 
A y m =y m — y m .If we insert (6.16) into our assumption (6.2’) 

(/( X m+2 > 2/m+2) ~ f{ x m-f-2 ? ^m+2)’ ^m+2 2/m+2) — ® 

we obtain 

■ E ' = Re (^ A 2/m+ 2 - 2A 2/m+ 1 + ^ A J/m> A 26n+2 ) < ( 617 ) 

The main idea is now to subtract from this inequality a well-chosen quadratic term 
\\a 2 Ay m+2 + a x Ay m+1 + a 0 Ay m || 2 in order to bring it to the form required by 
(6.15). With A Y m = (A y m+1 , A y m ) T this means that 

E = II AY m+l lie? - \\ AY m\\ 2 G + ll«2 A «/m+2 + «1 A Vm+l + «0 A 2/ml| 2 ( 6 - 18 ) 

with a positive definite matrix 


922 9 21 

9 2 i 9n 


Multiplying out and comparing the coefficients of Re(Ay iJ Ay j ) in 
(6.18) gives the six relations 

(6.17) and 

3 

2 “ 922 + a 2? 0 = 9l 1 — 922 a i > 

0 — —g n + 

(6.19a) 

2 = 2y 21 + %a 2 a 1 , — 2a 2 a 0 ^ 

0 — 2y 2 ^ "T 2(2 j(Zq. 

(6.19b) 
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Adding all six equations gives 0 = (a 0 + a 1 + a 2 ) 2 , so that a 0 + a 1 +a 2 = 0. This 
relation together with (6.19b) determines the a i as a 0 = ±1/2, a 1 = qpl, a 2 = 
±1/2. Inserting this into (6.19) yields the positive definite matrix 


_ 1 / 5 -2 

4 V —2 1 


( 6 . 20 ) 


Since E < 0 by (6.17), it follows from (6.18) that the 2-step BDF method is G- 
stable. 


An Algebraic Criterion 

The algebraic structures of the foregoing computations become much more visible, 
if we replace formally in (6.17) and (6.18) all 

(Ay m+i ,Ay m+J )^C^ 

and use 


2Re (A?/ m _j_ z -, Ay m +j) ( K Ay rri _^_^ Ay m _^j) + {[Ay m _^j , Ay m _^fj. 

This yields 

^=^(CMw) + ?(«M0) (6-17’) 

k k k 

E = (<;«>- i) E (6 - 18 ’ } 

i,j —1 z=0 j =0 

We can now formulate an algebraic criterion which, in a different notation, already 
appears in Dahlquist (1975). 

Theorem 6.6 (Baiocchi & Crouzeix 1989). Consider a method (g, cr). If there ex¬ 
ists a real , symmetric and positive definite matrix G and real numbers a 0 ,..., a k , 
such that 

^(£(CM w ) + £( w M0) 

k k k (G) 

= —i) E '~ j 1 + (E a ^ ! ) (E a X)’ 

i,j =1 z=0 j=0 

then the corresponding one-leg method is G-stable. 

Remark. The factor 1/2 on the left-hand side of (G) is of no significance and 
can be replaced by any other positive constant, leading to another scaling of the 
coefficients g {j and a •. 
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Proof. We just replace by (Ay m+ •, A?/ m+J ) in Eq. (G) and obtain 


Re ( 53 a i Ay m+z ’ E ft 'j A ^m+j ) 


j =0 


( 6 . 21 ) 


ll Ay m +1 llG - ll^mllo + II E 


z =0 


We then insert (6.6) and use (6.2’) with v = 0 and obtain the desired estimate 

ii AR m+ iii G <ii A njG- □ 


An interesting question is now, for which methods (g, a) Condition (6.21) is 
satisfied. By Theorem 6.4 the method is necessarily A-stable. Is this also suffi¬ 
cient? 


The Equivalence of A-Stability and G -Stability 


Dahlquist struggled for three years to get the answer, which is 

Theorem 6.7 (Dahlquist 1978). If g and a have no common divisor, then the 
method (g,cr) is A-stable if and only if the corresponding one-leg method is G- 
stable. 

Proof We follow here the presentation of Baiocchi & Crouzeix (1989). Recall first 
that A-stability of the method (g, cr) implies 

Re £>(CMC) > 0 for Id > 1 (A) 

(see Sect. V.l). Because of Theorems 6.4 and 6.6 it is sufficient to prove that con¬ 
dition (A) implies the existence of a real, symmetric and positive definite matrix G 
and real numbers a Q ,..., a k such that Property (G) holds. The proof is in three 
steps: 

a) computation of a 0 ,..., a k ; 

b) computation of G ; 

c) show that G is positive definite. 

a) The term containing the g i f s in (G) disappears if we put to — l/(. We 
therefore consider the function 

^(C) = i(^(CMi/C) + ^(i/CMC)), 


(6.22) 
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which is of the form 

m = C r(c r + 77 ) + c r-l (c r_1 + 77=l) + ' • • + C l(c + 7 ) + C 0 

2r (6.23) 

= c f r m-c J ) 

s j =1 

with some r < k. Since E(Q = E(l/Q, for each root (L of the polynomial 
( r E(Q the inverse 1 /£ • is also a root with the same multiplicity. Therefore there 
are as many roots inside the unit circle as there are outside . As to the roots on the 
unit circle, Condition (A) tells us that E(() = Re p(()cr(() > 0 on the unit circle. 
Therefore, all roots on the unit circle must have even multiplicity , half of them we 
declare “inside” and half of them we declare “outside”. The clever idea is now to 
collect all roots “outside” the unit circle into a product, so that 

m=p n «-Ci) n ((-(,> 

C j outside Cj inside 

= 77 II K-y II (C- 7 ) (6.24) 

C j outside Cj outside 3 

= k n «-<,> n (7-0 

Cj outside Cj outside 

where K is a constant. But this constant must be non-negative, as can be seen thus: 
by Condition (A), E(() is non-negative on the unit circle. The same is true for the 
function divided by K , since each factor (e l ° — f •) from the first product has a 
complex conjugate brother ( e~ iB — £ ■) in the second. Therefore E(() in (6.24) 
can be factored as 

E(C) = a(C)-a(l/C) _ (6.25) 

where 

k 

a(() = s/K J] (C-C i )=:E a ^ < - (6-26) 

Cj outside *=° 

and step (a) is done. 

b) It follows from (6.22) and (6.25) that the polynomial 

P((,v) = ^(CMuO + eH^C)) -a(C)a(w) (6.27) 

vanishes when (to — 1 = 0. It can therefore be written as 

k 

P(C,w) = (Cu,-l) ]T s^C-V- 1 . (6.28) 

ij =1 

The coefficients g i - are real and satisfy g- = g ^ , because P((,w) = P( 
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c) Looking at (6.28), it appears at first sight a difficult task to prove positive 
definiteness for the matrix G — (g^) defined there. The crucial idea is the follow¬ 
ing: choose k (at first arbitrary) complex numbers ,..., ( k and replace in (6.28) 
£ h* ( q , to h* £ r , which gives together with (6.27) 

V= ECVr 1 

(6.29) 

= + +«(C 9 HC r )}. 

S q^>r 

Here the b qr are the elements of the matrix 

B = V*GV 

where V = ((j -1 ) ls a Vandermonde matrix. Thus, we now have to prove that B 
is positive definite, which appears much easier. First, we develop 

——=~T = 1 + CqCr + CgCr + CqC + • * • (6.30a) 

which converges if 

\C q \ < 1 9 = 1,2,...,*:. (6.30b) 

Next, we require that for all q 

g(( q ) + ^ a {(q) = 0 for some A >0. (6.31) 

With the exception of a finite number of A's, the k roots of Eq. (6.31) are all 
different. A-stability (assumption (A)) implies (6.30b), because —A lies in the 
interior of the stability domain. Inserting (6.31) and (6.30a) into (6.29) gives, for 
an arbitrary non-zero vector v = (v 1: ...,v k ), 

k 

v*Bv= Y, v g b gr v r 

g,r=l 

which looks rather positive. This expression cannot be zero for v / 0, because 
it follows from (6.31) that cr(( q ) / 0 for all q, otherwise g and a would have a 
common factor. Therefore v*Bv > 0, thus the matrix B , and consequently the 
matrix G, is positive definite. □ 

It is worth noting that the above proof provides constructive formulas for the 
matrix G. As an illustration, we again consider the 2-step BDF method (6.16) with 
generating polynomials 

e(0 = 2^ 2 -2 C+ 


oo k k 

= E{lE».c«(c,)l 2 +-'lE».C'(y(}. 

ra=0 g=l g=l 


^(C) = C 2 - 
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The function E(() (Formula (6.22)) becomes 

E (0 = ^(e 2 + ^-) - (c+^) + 2 = ^(c _ !) 2 -1 ) 2 

so that a(() — |(C — l) 2 . Inserting this into (6.27) gives 

J^(C, w) = (Co, - i)(|c^ - |c -+ 2>, 

so that g 22 = 5/4, g 12 = g 21 = —1/2, g n = 1/4 is the same as (6.20). 


A Criterion for Positive Functions 


In the proof of Lemma IV. 13.19 we have used the following criterion for positive 
functions, which is an immediate consequence of the above equivalence result. 

Lemma 6 . 8 . Let x{ z ) — a ( z ) /' P{z) be an irreducible rational function with real 
polynomials a(z) of degree <k — 1 and f3(z) of degree k. Then x{ z ) is a positive 
function, i.e., 

Rex(^)>0 for Rez>0, (6.32) 

if and only if there exist a real, symmetric and positive definite matrix A and a real, 
symmetric and non-negative definite matrix B, such that 

k k 

a(z)/3(w) + a(w)/3(z) = (z Aw) ^ ^ b i -z l ~ 1 w^~ 1 . (6.33) 

i,j= 1 i,j=l 


Proof The “if”-part follows immediately by putting w — z in (6.33). For the “only 
if’’-part we consider the transformations 


z + 1 


z- 1 ’ 


z = 


C + 1 
C-1 


and 


id = 


W + l 
W — 1 ’ 


w = 


Ld + 1 


id — 1 


(6.34) 


and introduce the polynomials 


<*c> = 0 


<0-(¥)*'(£)■ 


As the transformation (6.34) maps \(\ > 1 onto the half plane Rez > 0, Condi¬ 
tion (6.32) is equivalent to Assumption (A). Therefore, Theorem 6.7 implies the 
existence of a real, symmetric and positive definite matrix G and of real numbers 
a 0 ,..., a k such that 


^(CMw) + ?(wWC)) =(Cw-i) 9ijC * + 

i,j= 1 


?—n inf) 
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Backsubstitution of the old variables yields 

^(a(z)P(w) + a(w)f3(z)) (6.35) 

k 

= 2{z + w) E 9ij{ z + 

i,j =1 

k k 

+ (E «<(* +1 Yi z -1)*-*) (E +1)> - • 

z=0 j =0 

Rearranging into powers of 2 and w gives Eq. (6.33). Since the polynomials (z + 
l) i ~ 1 (z — for i = 1,..., k are linearly independent, the resulting matrix A 
is positive definite. The coefficient of z k w k in the second term of the right-hand 
side of (6.35) must vanish, because the degree of a(z) is at most k — 1. We remark 
that the matrix B of this construction is only of rank 1. □ 


Error Bounds for One-Leg Methods 

We shall apply the stability results of this section to derive bounds for the global 
error of one-leg methods. For a differential equation (6.1) with exact (smooth) 
solution y(x) it is natural to define the discretization error of (6.6) as 

k k 

$ol( x ) = E a M x + ih ) ~ h f( x +0 h > Y,hy( x + ih )) ( 6 - 36 ) 

i=0 i =0 

with (3 = cr'(l) = iPi • F°r the BDF methods we have Yi -f ih) = y(x + 
j3h) , so that (6.36) equals 

k 

$ D {x) = a iy( x + “ hy f (x + (3h), (6.37) 

i=0 

the so-called differentiation error of the method. For methods which do not satisfy 
PiV{ x + ih) = y(x + j3h) , the right hand side of (6.36) may become very large 
for stiff problems, even if the derivatives of the solution are bounded by a constant 
of moderate size. In this case, the expression (6.36) is not a suitable quantity for 
error estimates. Dahlquist (1983) proposed considering in addition to S D (x) also 
the interpolation error 

k 

S^x) = E Pi y ( x + ~ y ( x + (6.38) 

1=0 

For nonstiff problems (with bounded derivatives of /) these two error expressions 
are related to $ol( x ) by 

s oL( x ) = s D( x )- h ^ i x ,y( x )) 5 i( x ) + °( h \\ 5 i( x )\\ 2 )- 
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Taylor expansion of (6.37) and (6.38) shows that 

8 D (x) = 0(h*° +1 ), 5 I (x) = 0(h^ +1 ), (6.39) 

where the optimal orders p D and pj are determined by certain algebraic condi¬ 
tions (see Exercise la). From (3 — a’{ 1) we always have p 7 > 1 and from the 
consistency conditions it follows that p D > 1. However, the orders p D and pj 
may be significantly smaller than the order of the corresponding multistep method 
(Exercise 1). The constants in the 0(. ..)-terms of (6.39) depend only on bounds 
for a certain derivative of the solution, but not on the stiffness of the problem. 

Using S D (x) and S T (x) it is possible to interpret the exact solution of (6.1) as 
the solution of the following perturbed one-leg formula 

k k 

a iy( x + ih ) ~ s d( x ) = h f( x +X! Pi y ( x + ih "> ~ s i( x i) • ( 6 - 4 °) 

2=0 2=0 

The next lemma, which extends results of Dahlquist (1975) and of Nevanlinna 
(1976), investigates the influence of perturbations to the solution of a one-leg me¬ 
thod. 


Lemma 6.9. Consider, in addition to the one-leg method (6.6), the perturbed for¬ 
mula 


k 

a iym+i - 5 m= h f 

2 = 0 


k 

X m+l 3h ’Y.( 3 *y™+i- £ m 
2 = 0 


(6.41) 


Suppose that the condition (6.2’) holds for the differential equation (6.1) and that 
the method is G -stable. Then the differences 

yj yj ’ (^2/ra+ k— i ^y-m ) 

satisfy in the norm (6.14) 

ll Ay m+illG < (! + chv)\\AY m \\ G + C(||<$J| + ||e m ||) for 0 < hv < Const. 

The constants c, C, and Const depend only on the method, not on the differential 
equation. If v < 0 we have 

II AF m+1 || G < \\AY m \\ G + C{\\6 m \\ + || £ J|) for all h> 0. 


Proof. We shall make the additional assumption that / is continuously differen¬ 
tiable. A direct proof without this assumption is possible, but leads to a quadratic 
inequality for ||Ay rn+1 || G . 

The idea is to subtract (6.6) from (6.41) and to use 

f(x m + Ph, PiVm+i - £ m) ~f( x m+ P h i P^m+i) 

^m Pi^Vm+i ~ ^m) 
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where 

J m = J ^( X m+ P h i f Y P^m+i + ( X “ *) (Y PiVm+i ~ O) dt 

This yields 

k k 

Y a i A ym+i = hJ m Y Pi A Vm+i + 5 m~ hJ m e m' 

i —0 i=0 

Computing Ay m+fc from this relation gives 

A V m +k = Az m+k + K - - h Jm e m) ( 6 - 42 ) 

where Az m + k is defined by 

k k 

Y a i Az m+i = hJ m Y Pi Az m+i (6-43) 

i=0 i=0 

and A z- = A y- for j < m + k. By our assumption (6.2’) the matrix J m sat¬ 
isfies the one-sided Lipschitz condition Re(J m u,u) < v\\u\\ 2 (see Exercise 6 of 
Sect. 1.10). Taking the scalar product of (6.43) with ^ /?• Az m+ • and using (6.21) 
we thus obtain in the notation of (6.13) 

||AZ m+1 || 2 G - ||AZ m || G < c 0 hv\\ 5>A * m+ ,|| 2 

< c l hv(\\AZ m + 1 || G + ||AZ m || G ) 2 

(the second inequality is only valid for v > 0 ; for negative values of v we replace 
v by 0 in ( 6 . 2 ’)). A division by || AZ m+1 || G + || AZ m \\ G then leads to the estimate 

||AZ m+1 || G <(l + cHI|AZ m || G . (6.44) 

With the help of von Neumann’s theorem (Sect.IV.il) the second term of (6.42) 
can be bounded by Const(\\5 m \\ + ||e m ||). Inserting this and (6.44) into (6.42) 
yields the desired estimate. □ 


The above lemma allows us to derive a convergence result for one-leg methods, 
which is related to B -convergence for Runge-Kutta methods. 

Theorem 6.10. Consider a G-stable one-leg method with differentiation order 
p D > p and interpolation order p T >p — 1. Suppose that the differential equation 
satisfies the one-sided Lipschitz condition (6.2’). Then there exists C 0 > 0 such 
that for hu < C Q 

\\y m -y( x m)\\ \\yj-y( x j)\\ + MhP - ( 6 - 45 ) 

0<j<k J J 

The constant C depends on the method and , for v > 0, on the length x m — x 0 
of the integration interval; the constant M depends in addition on bounds for the 
p-th and (p -f 1 ) -th derivative of the exact solution. 
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Proof. A direct application of Lemma 6.9 to Eq. (6.40) yields the desired error 
bounds only for p T >p. Following Hundsdorfer & Steininger (1991) we therefore 
introduce fax) = y(x) — 5 fax), so that (6.40) becomes 

k k 

aifax + ih) — 5{x) = hf(x + (3h , fay(x + ih) — ?(x)), (6.46) 

2 = 0 2 = 0 

where 

k k 

S(x) = S D (x) — ^^a i 5 I (x + ih), fax) = 5j(x)— fa5j(x + ih). (6.47) 
2 = 0 2 = 0 

Using ^(1) =0 and cr(l) = 1, Taylor expansion of these functions yields 

/» x~\~ k h 

||?(x)|| + ||£(x)||<C 1 ^ / ||2 / p+1) (C)|K- 

J x 

We thus can apply Lemma 6.9 to (6.46) and obtain 

l|Ay m+1 || G <(l + cH||Ay m || G + M 1 ^ +1 

where A y- — fax fa — y-. Using (1 + chvfa < exp(ci/(x J - — x 0 )), a •simple induc¬ 
tion argument gives 

\\AY m+1 \\ G <C\\AY 0 \\ G + MhK 
The statement now follows from the equivalence of norms 

4II a 1 oIIg < H A yJ < <y a^Hg. 

0<j<k J 

from the estimate \\y m - y{x m )\\ < \\y m - y(x m )\\ + || < 5 J (a; m )||, and from the fact 
that P/(a: m )|| =0(h p ). □ 


Convergence of A-Stable Multistep Methods 

An interesting equivalence relation between one-leg and linear multistep methods is 
presented in Dahlquist (1975) (see Exercise 3). This allows us to translate the above 
convergence result into a corresponding one for multistep methods (Hundsdorfer 
& Steininger 1991). A different and more direct approach will be presented in 
Sect. V .8 (Theorem 8.2). 

We consider the linear multistep method 

k k 

a iym+i = h Yi Pif(* m +i, Vm+i)- 
2 = 0 2 = 0 


(6.48) 
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We let x m = x m - (3h, so that £*L 0 = x m , and, in view of Eq. (6.54), we 

define {y 0 , y 1 ,.. ., y 2 k-i } as the solution of the linear system 

k k 

53 PiVj+i = Vj, 53 a iVj+i = > £?)> j = 0,..., fc - 1. (6.49) 

z=0 z=0 

This system is uniquely solvable, because the polynomials g(Q and <j(£) are rela¬ 
tively prime. With these starting values we define {y m } as solution of the one-leg 
relation (for m>k) 

k k k 

53 = fc/(53 Pi X m+0 53 P^m+z) ■ ( 6 - 5 °) 

z=0 z=0 z=0 

By the second relation of (6.49), Eq. (6.50) holds for all m > 0. Consequently 
(Exercise 3a) the expression PiVm+i * s a solution of the multistep method 

(6.48). Because of (6.49) and the uniqueness of the numerical solution this gives 

k 

53^m+i=ym fora11 m>0. (6.51) 

z=0 

This relation leads to a proof of the following result. 


Theorem 6.11. Consider an A-stable linear multistep method of order p. Suppose 
the differential equation satisfies (6.2’). Then there exists C 0 > 0 such that for 
hv < C 0 , 


<C( ma,x \\y j -y(xM+h max \\f(x-,y.) -y'(xM) +Mh p . 


The constants C and M are as in Theorem 6.10. 


Proof. By Theorem 6.7, A-stability implies G -stability of the corresponding one- 
leg method. Further, Taylor expansion of (6.37) and (6.38) shows that p D > 
min(p, 2) and p T > 1. Since p < 2 by Dahlquist’s second barrier, all assumptions 
of Theorem 6.10 are verified. The one-leg solution {y m } thus satisfies (6.45). In 
order to estimate \\y ■ — y{x - )\\ for j < k we subtract the definitions of 5 D (x) and 
5 T (x) from (6.48) and obtain 

k 

53 Pi {y 3 +, - y( x j+ t )) = y 3 - y( x 3 ) - s A x j ) 

z=0 

k 

53 a i(y 3 +i - y( x j+ t )) = h fi x j,y 3 ) - hy'( x j) - snixj). 

1=0 

Solving these relations for y- — y{x-) yields 

- c ° ( 0 <f<k ^ ~ II fi x j^j) ~ y'(^-)ll) + M o hP ■ 
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This proves the statement, because by (6.51) 

k 

112/m -y{Xm)\\ <^\Pi\\\y m +i-y( X m+i)\\ + \\ 5 l{ X m)\\- □ 

i=0 


Exercises 

1. a) Prove that the one-leg method (6.6) satisfies (6.39) iff 


k 


i=0 

for q = 0,1,. 


(6.52) 

il 

for q = 0 ,... 

> Pi • 

(6.53) 


Compare this result with Theorem III.2.4. 

b) Compute the orders p D and p T for the Adams methods. 

2. a) Show that the one-leg method (6.6) can be written in the form of a general 
linear method (Sect. III.8). 

b) Prove that the order of convergence p of this method is given by 

p = mm(p D ,p I + l) 
with p D ,Pj defined in (6.39). 

c) The order of a one-leg method is never larger than the order of the corre¬ 
sponding multistep method. 

3. (Dahlquist 1975). 

a) Let { y m } and {x m = x 0 -\-mh} satisfy the (one-leg) difference relation 
(6.6); then 

k k 

Vm = Pjym+j > X m = Yl Pj X m+j ( 6 ' 54 ) 

j—0 j =o 

satisfy the (linear multistep) difference relation (6.3). 

b) Conversely, let 

j=0 j—0 
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be such that P(()a(Q — Q(Qg(Q = ( 1 for some integer l (0 < l < k) , then 

k — 1 k — 1 

Vm-i-l ^ y ^(7 2/m+j ^ ^ y f (*^m+j ? Um+j ) 

j=0 j=0 

fc-1 fc-1 

*m+i = I] ~ h J2 b J 

3=0 j =0 

satisfy (6.6), whenever {5/ m } and {ir m } are a solution of (6.3). 

Hint for a). Multiply ( 6 . 6 ) by /? •, replace m by m + j, sum from j = 0 to 
j = k, and interchange the summations. 


4. One-leg collocation methods (Dahlquist 1983). 

a) For a given (3 there exists a unique k -step one-leg method with p D = k and 

Pi = k. 

b) This one-leg method is of order p = k + 1 iff 


i=0 


c) Discuss numerically the zero-stability of these methods. 


5. (proposed by M. Crouzeix). a) Let R(z) = P(z)/Q(z) be an irreducible ra¬ 
tional function where deg P < fc, deg Q <k. Show that R(z) is A-stable, if 
and only if there exist polynomials afz) , (3{z) with real coefficients and with 
deg a • < k — 1 , deg (3 < k , such that 

k 

Q(z)Q(w) - P(z)P(w ) = -(z + to) a-(z)a 2 (w) + /?(*)/?(it;). (6.55) 

2=1 

b) Use this characterization to give a new proof of von Neumann’s theorem 
(Corollary IV. 11.3). 

Hint. Part (a) can be proved along the lines of the proofs of Theorem 6.7 and 
Lemma 6 . 8 . Remark that (6.55) reduces to the E -polynomial (IV.3. 8 ) if 2 = iy 
and w — —iy. For the proof of (b), deduce from (6.55) the identity 

k 

||Q(A ) U || 2 - \\P(A)u\\ 2 =-J2 R e(a t (A)u,A ai (A)u) + ||/?(A)«|| 2 . 

2=1 
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Theorems 6.10 and 6.11 give satisfactory convergence results for G -stable one-leg 
methods and A-stable multistep methods. But there are only few such methods 
and their highest order is two (Theorem 1.4). It is therefore interesting to relax the 
requirement of A-stability and to investigate higher-order multistep and one-leg 
methods. This section is devoted to linear stiff problems, while Sect. V.8 will treat 
non-linear problems. 

We shall describe two different approaches for convergence results. One is 
with the help of the discrete variation of constants formula and shall be given at the 
end of this section (see Lemma 7.9 and Theorem 7.10 below). The other possibility 
is based on a formulation as a one-step method and on the use of the Kreiss matrix 
theorem. 


Difference Equations for the Global Error 


Most of the difficulties can already be seen by studying the one-dimensional prob¬ 
lem of Prothero and Robinson 

y' = ^y + g(x), y{x 0 ) = y Q . (7.1) 

We assume Re A < 0 and the solution y(x) to be smooth in the sense that suffi¬ 
ciently many derivatives are bounded independently of the stiffness parameter A. 
Applying a linear multistep method to (7.1) yields 

k k k 

Y a ,y m+t = hX Y P^m+i + h Y fcs( X rn+i)- ( 7 - 2 ) 

«=0 i=0 


The global error 


e m =ym-y( X m) 


is seen to satisfy the difference relation 

k 

- liA/3 t )e m+i = S LM (x m ) 

i=0 


(7.3) 


(7.4) 
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S Lm( x ) = 53 a i y ( X + ih )- h ^2 Pi y '( X + ih ) ( 7 -5) 

z=0 z=0 

(to be compared with Formula III.2.3). We observe that the right-hand side of (7.4) 
is independent of the stiffness (i.e., of A). Further, if the classical order of the 
method is p,then S LM (x) = G(h pJrl ). 

If we apply the method in its one-leg version, we obtain 

k k 

53 a i y m+i = hX 53 PiVm+i + h 9( X m + P h )i ( 7 - 6 ) 

z=0 i= 0 

where ^ Pi — 1 an d PP — P • I n this case the global error e m — y m — y(x m ) 
satisfies 

k 

53 k - hX P t ) e m +i = hXS l( X m) - S D( X m ) ( 7 - 7 ) 

i=0 

with and 8j(x) defined in (6.37) and (6.38), respectively. Unless 8j(x) = 0 

(which is the case for the BDF methods), Eq. (7.7) is disappointing, because its 
right-hand side becomes large in the stiff case (hX — y oo). 

In order to overcome this difficulty, Dahlquist (1983) proposes that one con¬ 
sider instead of e m = y rn - y{x m ) the quantities 

k 

e *m=^2P,ym+t-y( X m+P h ) ( 7 -8) 

i=0 

(“... a more adequate measure of the global error than the customary one ...”, 
Dahlquist 1983). Replacing m by ra+j in (7.6), multiplying by P- and summing 
up gives the error formula 

k 

53(a, - h\Pi)e* m+i = S LM {x m + Ph) (7.9) 

i=0 

with S LM (x) of (7.5). This difference relation now has the same strength as (7.4). 

It has been pointed out by Hundsdorfer & Steininger (1991) that we usually 
get better error estimates for one-leg methods by considering e m = e m + 5j(x m ). 
We then have 

k 

53(a, - h\p t )e m+t = h\e{x m ) - 5(x m ) (7.10) 

1=0 

with e(x) and £(x) given by (6.47). Observe that e(x) = 0(h Pl + 2 ) and £(x) = 

£)^min(p£> + l,pj+ 2 ) . 
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Formulation as a One-Step Method. The error relations (7.4), (7.7), (7.9), and 
(7.10) are all of the form 

k 

5><~ hA = (7-H) 

z=() 

In order to estimate e m it is convenient to introduce, as in Sect. III.4, the vector 

= { e m+k -1 ? * * * ? e m-j-l ? e ?? 7 . ) 5 (7-12) 

the companion matrix 

c fe-iM ••• c i(aO c o(f) 

1 

1 0 

= ^ = /iA. (7.14) 

Then, Eq. (7.11) becomes 

E m+1 =C(h\)E m + A m , (7.15) 

which leads to 

rn 

E m+1 =C(h\) m+1 E 0 + J2c(h\r-iA r (7.16) 

j =o 

To estimate j£ m+1 we have to bound the powers of C(hX) uniformly in hX . This 
is the subject of the next subsection. 

The Kreiss Matrix Theorem 

Als Fakultatsopponent fiir meine Stockholmer Dissertation brachte 
Dr. G. Dahlquist die Frage der Stabilitatsdefinition zur Sprache. 

(H.-O. Kreiss 1962) 

The following Theorem of Kreiss (1962) is a powerful tool for proving uniform 
power boundedness of an arbitrary family of matrices. 

Theorem 7.1 (Kreiss 1962). Let T be a family of k x k matrices A. Then the 
“power condition ” 

\\A n \\<M for n = 0,1,2,... and AeJ 7 (P) 

is equivalent to the “resolvent condition’ 

||(^4- ziy 1 1| < tT-t for |z|>l and A^E. (R) 

FI 
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Remark. The difficult step is to prove that (R) implies (P) . Several mathemati¬ 
cians contributed to a better understanding of this result (Richtmyer & Morton 
1967, Tadmor 1981). LeVeque & Trefethen (1984) have given a marvellous ver¬ 
sion of the proof; the best we can do is to copy it nearly word for word: 

Proof. Necessity. If (P) is true, the eigenvalues of A lie within the closed unit 
disk and therefore (A — zl)~ l exists for \z\ > 1. Moreover, 

°° °° M 

||(A - zl )~ 1 II = II £ A n z~ n ~ 1 II < M £ M""" 1 = T-—(7.17) 

n=0 n=0 ' ' 

so that ( R ) holds with C — M. 

Sufficiency. Assume condition (R) , so that all eigenvalues of A lie inside the 
closed unit disk. The matrix A n can then be written in terms of the resolvent by 
means of a Cauchy integral (see Exercise 1) 

A n = 2— f z n (zl - Ay'dz, (7.18) 

Z7TZ Jy 

where the contour of integration is, for example, a circle of radius g > 1 centred at 
the origin. Let u and v be arbitrary unit vectors, i.e., \\u\\ = ||u|| = 1. Then, 

v*A n u = -—: f z n q(z)dz with q(z) = v*(zl — A)~ 1 u. 
ztti Jy 

Integration by parts gives 

v *A n u=—f— f z n+1 q'(z)dz. 

2m(n A 1) Jr 

Now fix as contour of integration the circle of radius p=l + l/(n + l). On this 
path one has |z n+1 1 < e, and therefore 

yhn)i W{z)]]dzl (7 - 19) 

By Cramer’s rule, q(z) is a rational function of degree k. Applying Lemma 7.2 
below, the integral in (7.19) is bounded by 47 rk times the supremum of \q(z)\ on 
T, and by ( R ) this supremum is at most (n + 1)C. Hence 

\v*A n u\<2ekC. 

Since ||A n || is the supremum of \v*A n u\ over all unit vectors u and v , this proves 
the estimate (P) with M — 2 ekC. □ 


The above proof used the following lemma, which relates the arc length of a 
rational function on a circle to its maximum value. For the case of a polynomial 
of degree k the result is a corollary of Bernstein’s inequality sup| z | =1 \q'{z)\ < 
fcsup| z | = i |g( 2 )| (seee.g., Marden 1966). 
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Lemma 7.2. Let q(z) = p(z) / r(z) be a rational function with deg p<k, deg r <k 
and suppose that no poles lie on the circle T : \z\ — g. Then 



< 47T& SUp |g(z)|. 
I z\ = Q 


(7.20) 


Proof Replacing q(z) by q(gz) we may assume without loss of generality that 
g — 1. With the parametrization e lt of T we introduce 

l(t)=q(e% l'(t) = ie it q'(e it ) 

so that 

= with ff (i) = arg( 7 '(i)). 

Integration by parts now yields 

l \q'(z)\\dz\= l \q'(e‘ t )\dt= f 7 '(t)e~ tgW dt 
Jr Jo Jo 

= i f '){t)g'{t)e- igi - t) dt < sup| 7 (i)|- f \g'{t)\dt. 

Jo Jo 

It remains to prove that the total variation of g , i.e., TW[g] = J Q 27r \g f (t)\dt , can be 
bounded by 47 tA:. To prove this, note that zq f (z) is a rational function of degree 
( 2 fc, 2k) and can be written as a product 


2k 


—f a-z-\-b■ 


c ■ z T d ■ 
3=1 ^ ^ 


This implies for ^ = e H 


7T 

g{t) =arg (izq'(z)) = - + 


2k 


XL r g( 

j -=1 


a j z + b j \ 

c j z + d d' 


Since the Mobius transformation (az + b)/(cz + d) maps the unit circle to some 
other circle, the total variation of arg((a^ + 6)/ (cz + d)) is at most 2tv . Conse¬ 
quently, 


2k r- 

TV[s]<53TV arg( 


3 — 1 


a j Z+b 3 
C 3 Z + d J 


< 47rfc. 


□ 


Remark. It has been conjectured by LeVeque & Trefethen (1984) that the bound 
(7.20) is valid with a factor 2tt instead of 47r. This conjecture has been proved to 
be true by Spijker (1991). 
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Some Applications of the Kreiss Matrix Theorem 


Following Dahlquist, Mingyou & LeVeque (1983) we now obtain some results on 
the uniform power boundedness of the matrix C(/i), defined in (7.13), with the 
help of the Kreiss matrix theorem. Similar results were found independently by 
Crouzeix & Raviart (1980) and Gekeler (1979, 1984). 

Lemma 7.3. Let S C C denote the stability region of a method If S is 

closed in C, then there exists a constant M such that 

\\C(ti) n \\ < M for (x G S and n — 0,1,2,.... 


Proof Because of Theorem 7.1 it is sufficient to prove that 


||(C0u)- 2 j)- 1 || < 


c 


z-l 


for fi G S and w > i. 


To show this, we make use of the inequality (Kato (1960), see Exercise 2) 

H (r(lA „ n -i|| ^ (11^)11 +kl)* _1 

||C'(/j)|| is uniformly bounded for /j € S'. Therefore it suffices to show that 

| det(C(/i) — zl)\ 


¥>(/*) = ,inf 


M>i M fc_ 1 (kl-i) 


(7.21) 


is bounded away from zero for all ji E S. For \z\ —>• oo the expression in (7.21) 
tends to 1 and so poses no problem. Further, observe that 


k 

| det(C(^t) — zl)\ = | IJ(2-Cj(/*)) 
i=i 

where £ • (/i) are the eigenvalues of C(/i), i.e., the roots of 

k 

y> t -A^)c=(). 

2=0 


(7.22) 


(7.23) 


By definition of the stability region 5, the values £ -(/i) lie, for jj G 5, inside 
the closed unit disc and those on the unit circle are well separated. Therefore, 
for fixed {jl q G 5, only one of the CjiPo) can close to a z with \z\ > 1. The 
corresponding factor in (7.22) will be minorized by \z\ — 1, the other factors are 
bounded away from zero. By continuity of the Cj(fx ), the same holds for all fi £ S 
in a sufficiently small neighbourhood V(fjL 0 ) of fi Q . Hence <£>(/i) > a > 0 for 
fx G V(fi 0 ) n S . Since S is closed (compact in C) it is covered by a finite number 
of V(fx 0 ). Consequently (f(/x) > a > 0 for all gi G 5, which completes the proof 
of the theorem. □ 



V.7 Convergence for Linear Problems 


327 


Remark. The hypothesis “5 is closed in C” is usually satisfied. For methods 
which do not satisfy this hypothesis (see e.g., Exercise 2 of Sect. V.l or Dahlquist, 
Mingyou & LeVeque (1981)) the above lemma remains valid on closed subsets 
DCSCC. 

The estimate of this lemma can be improved, if we consider closed sets D 
lying in the interior of S. 

Lemma 7.4. Let S be the stability region of a method (^,<j). If D C IntS is 
closed in C, then there exist constants M and n (0 < k < 1) such that 

\\C(p) n \\ < Mn n for p £ D and n — 0,1,2,.... 

Proof If p lies in the interior of S , all roots of (7.23) satisfy | (L (/i) | < 1 (maximum 
principle). Since D is closed, this implies the existence of e > 0 such that 

DcS e = {tie C; |C»|<1 —2e, j = l,...,k}. 

We now consider R(p) = n~ 1 C(p) with k = 1 — e. The eigenvalues of R(p) 
satisfy |« -1 £-(/i)| < (1 — 2e)/(l — e) < 1 — e for (i G S e . As in the proof of Lem¬ 
ma 7.3 (more easily, because R(fx) has no eigenvalues of modulus 1) we conclude 
that R(fx) is uniformly power bounded for /i E S e . This implies the statement. □ 


Since the origin is never in the interior of 5, we add the following estimate for 
its neighbourhood: 

Lemma 7.5. Suppose that the method (q,ct) is consistent and strictly stable (see 
Sect. III.9, Assumption Al). Then there exists a neighbourhood V of 0 and con¬ 
stants M and a such that 

||CV) n || <Me n(Re ' ,+a l' , l 2) for fUEV and n = 0,1,2,.... 

Proof Since the method is strictly stable there exists a compact neighbourhood V 
■of 0, in which \(j(p)\ < |(i(aOI for j = 2,..., k ((j(p) are the roots of (7.23)). 
The matrix R(p) = C{p) then has a simple eigenvalue 1 and all other 

eigenvalues are strictly smaller than 1. As in the proof of Lemma 7.3 we obtain 
||-ft(^) n || < M and consequently ||C(^) n || < M |Ci (/ u )l rz for p G V . The stated 
estimate now follows from (p) = + 0(p 2 ). □ 
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Global Error for Prothero and Robinson Problem 

The above lemmas permit us to continue our analysis of Eq. (7.16). Whenever we 
consider A and h such that their product A h varies in a closed subset of 5, it 
follows that 

m 

||E m+1 ||<M(||i; 0 || + ^||A J ||) (7.24) 

3=0 

(Lemma 7.3). If hX varies in a closed subset of the interior of S', we have the 
better estimate 

m 

||S m+1 || < M(« m+1 ||E 0 || + with some k < 1 (7.25) 

j =0 

(Lemma 7.4). The resulting asymptotic estimates for the global errors e rn —y rn — 
y(x m ) for mh < Const are presented in Table 7.1 (p denotes the classical order, 
p D the differentiation order and pj the interpolation order of Sect. V.6). We as¬ 
sume that the initial values are exact and that simultaneously hX -> oo and h — y 0. 
This is the most interesting situation because any reasonable method for stiff prob¬ 
lems should integrate the equation with step sizes h such that hX is large. We 
distinguish two cases: 

(A) the half-ray {hX ; h > 0, \hX\ > c} U {oo} lies in S (Lemma 7.3 is applicable, 
i.e., Eq. (7.24)). 

(B) oo is an interior point of S (estimate (7.25) is applicable; the global error 
\\E m \\ is essentially equal to the last term in the sum of (7.25)). 


Table 7.1. Error for (7.1) when hX — >• oo and h — >• 0 


Method 

error 

(A) 

(B) 

multistep 

Cm 

<?(| Af 1 /^ -1 ) 

cHwr 1 #) 

one-leg 

Cm 

O^+^IAr 1 ^- 1 ) 

0(h Pl+1 +\\\~ 1 h PD ) 


We remark that the global error of the multistep method contains a factor |A| -1 , 
so that the error decreases if |A| increases (“the stiffer the better”). The estimate in 
case (A) for one-leg methods is obtained by the use of Recursion (7.10). 
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Convergence for Linear Systems with Constant Coefficients 

The extension of the above results to linear systems 

y' = Ay+g(x), y(x 0 ) = y Q (7.26) 

is straightforward, if we assume that the matrix A is diagonalizable. The following 
results have been derived by Crouzeix & Raviart (1980). 

Theorem 7.6. Suppose that the multistep method (p, cr) is of order p, A (a) -stable 
and stable at infinity. If the matrix A of (7.26) is diagonalizable (i.e., T~ l AT — 
diag(A 1 ,...,A n )j with eigenvalues satisfying 

l ar g(-\)l<« far i = 

then there exists a constant M (depending only on the method) such that for all 
h > 0 the global error satisfies 

l|y(»m)-J/mll< M -ll T ll-ll T_1 ll( 0 m a ^ s . \\y( x j)-yj\\ + hP j lly (p+1) (OII^)- 


Proof The transformation y — Tz decouples the system (7.26) into n scalar equa¬ 
tions 

z 'i = \ z i + { T ~ l 9)i{ x )- (7.27) 

Since this transformation leaves the numerical solution invariant, it suffices to con¬ 
sider Eq. (7.27). Lemma 7.3 yields the power boundedness 

||C(^A-) m || < M 0 for h> 0, i = l,...,rc and m > 0. (7.28) 

The discretization error S LM (x) (Eq. (7.5)) can be written as 

5 LM( x ) = hP+l f I< p {s)z ( - p+1 \x + sh)ds, (7.29) 

Jo 

where K p (s) is the Peano-kernel of the multistep method (Theorem III.2.8). By 
A (a)-stability we have a k • f3 k > 0, so that \a k — ^A z/ 6y -1 < laj -1 . This to¬ 
gether with (7.29) implies that 

II Ajll < C'h p P +k \T +l) (0\dC (730) 

Jx 3 

where C depends only on the method. The estimates (7.28) and (7.30) inserted 
into (7.16) yield a bound for the global error of (7.27), which, by backsubstitution 
into the original variables, proves the statement. □ 


Because of its exponentially decaying term, the following estimate is especially 
useful in the case when large time intervals are considered (or when the starting 
values do not lie on the exact solution). 
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Theorem 7.7. Let the multistep method (g, cr) be of order p> 1, A (a) -stable and 
strictly stable at zero and at infinity (i.e., = 0 implies | < 1 ). If the matrix 

A of (7.26) is diagonalizable (T~ X AT — diag (A 1? ... , A n )) with eigenvalues A- 
satisfying 

| arg(—A-)| <7 < a, Re A - < -A < 0 
then, for given h 0 > 0, there exist constants M and u>0 such that for 0 < h < h 0 

II 2/00 - Vm II < M ' ll T ll • ll T_1 II • {e~ v(Xm - Xo) ■ ma,x k \\y{xj) - Vj || 

+hp j* m ||?/ (p+ 1 ) ( 0 IMe) • 

Remark. The constants M and v may depend on 7 , A,/i 0 and on the method, 
but they are independent of the eigenvalues A ■ and of the length x m — x 0 of the 
integration interval. 

Proof. By Lemma 7.5 there exists an r > 0 such that 

||C(/ l A i ) m || <M 0 e - m ^ 2 for |/iA 2 |<r (7.31) 

(observe that \p\ < Const • \Rep\ , if | arg(—^)| < 7 < tt/2) . Since 

D={/r; |arg(-/r)| < 7 , |/r|>r}U{oo} 

lies in the interior of the stability region 5, it follows from Lemma 7.4 that 

||C(/iAJ m || <M^ m for |7zA-| > r (7.32) 

with some g < 1. Combining the estimates (7.31) and (7.32) we get 

||C(/iA-)’"|| < Me~ mhv for 0 <h<h 0 , (7.33) 

where M = max(M 0 ,M 1 ) and v = min(A/ 2 , — In g/hf). Using (7.33) instead 
of (7.28) and mh = x m - x 0 , the statement now follows as in the proof of Theo¬ 
rem 7.6. □ 


Matrix Valued Theorem of von Neumann 

An interesting contractivity result is obtained by the following matrix valued ver¬ 
sion of a theorem of von Neumann (Theorem IV. 11.2). 

We consider the Euclidean scalar product (•, •) on R n , the norm || • || G on R k 
which is defined by a symmetric, positiv definite matrix G , and 

for u = (u 1 ,..., u k ) T G R nk . 



(7.34) 
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The corresponding operator norms are denoted by the same symbols. 

Theorem 7.8 (O. Nevanlinna 1985). Let C(/jl) = (c-(/i))k J=1 be a matrix whose 
elements are rational functions of ijl. If 

l|C'(A*)llo < 1 for Re/i < 0, (7.35) 

then 

IIIC^IHg < 1 (7-36) 

for all matrices A such that 

Re {y, Ay) < 0 for ye C n . (7.37) 

Remark. If C(fi) is the companion matrix of a G-stable method (g, o) , the result 
follows from Theorem 6.7 and Exercise 3 below (“It would be interesting to have 
a more operator-theoretical proof of this.” Dahlquist & Soderlind 1982). 

Proof This is a straight-forward extension of Crouzeix’s proof of Theorem IV. 11.2. 
We first suppose that A is normal, so that A = QDQ* with a unitary matrix Q and 
a diagonal matrix £) = diag(A 1 ? ...,A n ). In this case we have 

II|C(A)||| g = |||(J® Q)C{D){I® Q*)||| g = |||C(D)||| g . (7.38) 

With the permutation matrix P = (I ® e 1 ,..., / ® e n ) (where I is the &;-dimen¬ 
sional identity and e- is the n -dimensional j-thunit vector) the matrix C(D) is 
transformed to block-diagonal form according to 

P*C{D)P = blockdiag (C( A,),..., C{ AJ). 

We further have P*(G ® I)P = I ®G . This implies that 

P*C{DY(G®I)C{D)P = blockdiag (C(A 1 )*GC(A 1 ),...) 

and hence also 

|||C(i7)||| G = max ||C(A I )|| G . (7.39) 

2 — 1 ,..., 72 

The statement now follows from (7.38) and (7.39), because Re A 2 < 0 by (7.37). 

For a general A we consider A(ui) = f(A + A*) + \ {A — A*) and define the 
rational function 

(f(co) — {u,C(A(lo))v) g = u*{G®I)C{A(lo))v. 

The statement of the theorem can then be deduced exactly as in the proof of Theo¬ 
rem IV. 11.2. □ 


This theorem can be used to derive convergence results for differential equa¬ 
tions (7.26) with A satisfying (7.37). Indeed, if the method (g,cr) is A-stable, 
the companion matrix (7.13) satisfies \\C(fj,)\\ G < 1 for Re (i < 0 in some suitable 
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norm (Exercise 3). The above theorem then implies |||C(/iA)||| g < 1 and Formula 
(7.16) with A replaced by A yields the estimate 

rn 

Mg - PoIIIg + (7.40) 

i=o 

This proves convergence, because A • can be bounded as in (7.30). 


Discrete Variation of Constants Formula 

A second approach to convergence results of linear multistep methods is by the use 
of a discrete variation of constants formula. This is an extension of the classical 
proofs for nonstiff problems (Dahlquist 1956, Henrici 1962) to the case ^ 0. It 
has been developed by Crouzeix & Raviart (1976), and more recently by Lubich 
(1988, 1991). 

We consider the error equation (cf. (7.13)) 

k 

AA a ; ~^%) e m+i =d m+k for m>(), (7.41) 

i =0 

and extend this relation to negative m by putting e • = 0 (for j < 0) and by defining 
c? 0 ,... ,d k _ 1 according to (7.41). The main idea is now to introduce the generating 
power series 

e (C) = XAC’, 'Av'; 

j>0 i>0 

so that (7.41) becomes the m-th coefficient of the identity 

(<?(C _1 ) - M cr (C _1 )) e (C) - C k d(C). (7.42) 

This gives 

e(C) = ( ( ?(r 1 )-Hr 1 ))" 1 rM() = r(c,/^(o ( 7 . 43 ) 

and allows to compute easily e m in terms of c? • as 

rn 

e m = Yl r ^-j^) d r (7 - 43>) 

J=0 

Here r -(/i) are the coefficients of the discrete resolvent 

KC,m) = 0(C) - G -1 (7-44) 

’ }>o 

where 

r/^N _ g(C _1 ) _ a oC k + • • • + Q fc-lC + Q fc 

“ A,C fc + -••+&-!( + &' 


(7.45) 
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Since (^(C -1 ) —/icr(C _1 ))r(C,/i) = C _/c , the coefficients r-{p) can be interpreted 
as the numerical solution y J of the multistep method applied to the homogeneous 
equation y' = py with step size h = 1, and with starting values y_ fe+1 = ... = 

y~i = °- yo = ( a k~vPk)~ 1 - 

Formula (7.43’) can be used to estimate e m , if appropriate bounds for the 
coefficients r J (/i) of the discrete resolvent are known. Such bounds are given in 
the following lemma. 


Lemma 7.9. Let S C C denote the stability region of the multistep method, 

a) If S is closed in C then 
M 


\ r ,id)\ < 


i + H 


for p G S and j = 0,1,2 ,... 


b) If D C Int S is closed in C then there exists a constant k (0 < k < 1) such 
that 

|r>)l < for ^eD and j =0,1,2 ,... 


c) If the method is strictly stable then there exists a neighbourhood V of 0 such 
that 

I^Gu)! <Me j(Re/i+a| ' l|2) for fieV and j =0,1,2 ,... 

The constants M, n, and a are independent of j and p. 


Proof The estimates for | r-{p)\ in (a), (b), and (c) can easily be deduced from 
Lemmas 7.3, 7.4, and 7.5 because r-{p) is the numerical solution for the problem 
y> = j_ iy with step size h — 1 and starting values y_ fe+1 = ... = y__ 1 =0, y 0 = 

( a k -dPkY 1 ■ 

As noted by Crouzeix & Raviart (1976) and Lubich (1988) the estimates of 
Lemma 7.9 can be proved directly , without any use of the Kreiss matrix theorem. 
We illustrate these ideas by proving statement (b) (for a proof of statement (a) see 
Exercise 4). 

By definition of the stability region the function (^(^(C -1 ) ~ /^(C -1 )) d° es 
not vanish for |C| < 1 if p G Int 5. Therefore there exists a k (0 < n < 1) such 
that C k (g{C~ 1 ) — /icr(C -1 )) has no zeros in the disk |£| < 1 /k. Hence, for p e D 

sup |(g(C" 1 )-MC~ 1 ))~ 1 C~ fc l < 1 f-, , 

|<|<i/k 1 + lMl 

and Cauchy’s integral formula 

= (7.46) 

Z7TZ J\(\ = l/ K 

yields the desired estimate. □ 
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The use of the discrete resolvent allows elegant convergence proofs for linear 
multistep methods. We shall demonstrate this for the linear problem (7.26) where 
the matrix A satisfies 

, M 

||(5/ - A) || < 1 + ^ for |arg(s-c)| < ir - a' (7.47) 

with some c £ R. This is a common assumption in the theory of holomorphic 
semigroups for parabolic problems (see e.g., Kato (1966) or Pazy (1983)). If all 
eigenvalues A • of A satisfy | arg(A ■ — c) — 7r| < a' then Condition (7.47) is sat¬ 
isfied with a constant M depending on the matrix A (Exercise 2). The following 
theorem, which was communicated to us by Ch. Lubich, is an improvement of 
results of Crouzeix & Raviart (1976). 

Theorem 7.10. Let the multistep method be of order p> 1, A(a)-stable and 
strictly stable at zero and at infinity. If the matrix A of (7.26) satisfies (7.47) with 
a' < a, then there exist constants C, h 0 , and 7(7 of the same sign as c in (7.47)), 
which depend only on M, c, a ' and the method, such that for h < h 0 the global 
error satisfies 

\\y( x m )-y m II 

r^m 

< C{e~ 1x ' m max \\y( x j) ~ 2 /jll + h p / ||j/ (p+1) (£)IM£) • 

0<}<k J Xo 

Moreover, if c < 0, then h Q can be chosen arbitrarily. 

Proof. The global error e m = y{x m ) — y m satisfies 

k 

- hA^)e m+i = d m+k 

i =0 

where 

l|dm+*ll<^/ ||t/ (p+1) (OII^ ; rn> 0 (7.48) 

X Xm 

and d 0 ,. .., d k _ x are linear combinations of the e- and hAe ■ with j < k. We 
split these expressions into 

d^ — d'i~{- hAd" for I < k, 

so that d! t and d" are linear combinations of the e • (j < k) only. We also put 
d' £ — d £ and d £ = 0 for I > k. The analysis at the beginning of this subsection 
(Eq. (7.43)) then shows that 

e(C) = r(C, hA)d'(C) + r(C, hA)hAd"((), (7.49) 

where as in the scalar case 

hc , hA) = (6(oi - hAy 1 fA- = y, 

^ ’ }>0 


(7.50) 
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We now apply Lemma 7.11 below with $( 5 ) = (si — A ) -1 . By assumption the 
estimate (7.57) holds with (3 = 1 so that 

||r > (/iA)|| < C e Tj/l . (7.51) 

The second term in (7.49) can be written as 

r(0hA)hA(6(Q)- 1 6(Qd"(Q=r , (0hA)d(Q (7.52) 


where 

r'(C, hA) = (6(QI - hA)- 1 hA(5(0)- 1 ffL = E A hA ^ 

^(C 0 ^ 

d(() = S(Od"(() = Y / d J ( J . 

J>0 


We apply Lemma 7.11 again, this time to 

$( 5 ) = (si — A)' 1 As' 1 — (si — A ) -1 — s' 1 !. 


(7.53) 


Condition (7.57) is satisfied with (3 — 1 so that 

||r'(/-iA)|| < C"e™\ (7.54) 

The coefficients 5- of J(£) are exponentially decaying because all zeros of <r(£) 
lie in |C| < 1. Consequently, we have 

\\d-\\<K J C max lleJI (7.55) 

m — 0 <£<k 

with some k < 1. The coefficient of ( m in (7.49) gives 

m m 

3 —0 j=0 

Inserting the estimates (7.48), (7.51), (7.54), and (7.55) proves the statement. □ 


We still have to prove the estimates for r-(hA) and r'-(hA ). For this we let 
<L(s) be some analytic (scalar-, vector-, or matrix-valued) function and consider 
the coefficients of 

mo/h) ■ = h £ <Pj(h)0- (7.56) 

^ ’ i> 0 

We then have the following result. 

Lemma 7.11 (Lubich 1991). Assume that the multistep method is A(a) -stable 
and strictly stable at zero and at infinity. Further suppose that $(s) is analytic in 
a sector | arg(s — c)| <tt — a 1 with a' < a, c G R and there satisfies 

||4>(s)|| < M • |s| - ^ for some j3 > 0. 


(7.57) 
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Then the coefficients <fj(h) of (7.56) are bounded for h < h 0 (sufficiently small) 
by 

\\v 3 {h)\\<C-{ 3 hy-'e» h for j > 1, (7.58) 

and for j = 0 the same bound holds as for j — 1 . The constants C, 7 , and h 0 
depend only on a', c, M, /3, and the multistep method. Moreover, if c < 0, then 
also 7 < 0 , and the result holds for arbitrary h 0 . 


Proof. By A (a) -stability we have (3 k /a k > 0, so that 5(0)/h lies in the region of 
analyticity of for h < h 0 . Cauchy’s integral formula thus gives 

mO/h) = ^ J m,h - X)-^(X)d\ (7.59) 


where T is a suitable contour from “00 • e ~ i ^ 7T ~ a> ) ” to “00 • e*( 7r_Q;/ ) ” within the 
sector of analyticity of and does not meet the origin (see Fig. 7.1; observe that 
$( 5 ) decays sufficiently rapidly at infinity). Multiplying (7.59) by (~ k /cr^- 1 ) 
and comparing coefficients of equal powers of ( yields the representation 

Vji h ) = r j ( h X)^(X)dX, j > 0, (7.60) 


which is a discrete analogue of the Laplace inversion formula. We next substitute 
uj — jh\ (for j = 0 we put cu = hX) so that with T • — jh-T Eq. (7.60) becomes 




u> \ doj 
jh)jh’ 


j > 1 , 


(7.61) 


and the use of (7.57) yields 


I 1 j r r fj) 'M 


(7.62) 



Fig. 7.1. Contour T in Formula (7.59) 


We still have to show that the integral in (7.62) is bounded by C • . For this 

we split it into two parts: the first one corresponds to those lo such that lj / j lies in 
a closed subset of the interior of the stability domain of the method. There we can 
use Lemma 7.9b so that the corresponding part of the integral in (7.62) is bounded 
by 

for h < h 0 . 


j-K j / 
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For the remaining part, the argument ujjj = h\ of r ■ in (7.62) lies, for suf¬ 
ficiently small h 0 , in a neighbourhood V of the origin, where the estimate of 
Lemma 7.9c holds. For jh > 1 we thus obtain the bound 

because Rea; = jhRe A, \u\ 2 / j <jh• Const and \u\ > |A| is bounded away from 
the origin. For small jh the contour Tj comes arbitrarily close to the origin so 
that a more refined estimate is required. The idea is to replace the corresponding 
part of r - (in (7.61) and hence also in (7.62)) by an equivalent contour which is 
independent of jh E [fo, 1], has a positive distance to the origin and remains in the 
neighbourhood V . The corresponding integral is thus bounded by some constant. 

□ 


Remark 7.12. In Lemma 7.11 it is sufficient to require the analyticity of $( 5 ) 
and the estimate (7.57) in a sector | arg(s — c)| < 7 r — o', where some compact 
neighbourhood of the origin is removed. We just have to take the contour T in 
(7.59) so that it lies outside this compact neighbourhood of 0. In this situation, the 
constant 7 may be positive also if c < 0. 


Exercises 


1. Prove the Cauchy integral formula (7.18) in the case where all eigenvalues A 
of A satisfy |A| < 1 and the contour of integration is the circle \z\ = g with 
0 > 1 . 

Hint. Integrate the identity 

00 

z n {zI-A)~ 1 =Y,A J z n -i~ 1 . 

3 =0 


2 . 


(Kato 1960). For a non-singular k x k -matrix B show that in the Euclidean 
norm 


ns- 1 11 < 


lisp - 1 

| det B | ' 


Hint. Use the singular value decomposition of B , i.e., B = U T AV, where U 
and V are orthogonal and A = diag ( 0 ^,..., a k ) with cr 1 > <r 2 > ... > cr k > 0. 


3. A method (p,cr) is called A-contractive in the norm || • || G (Nevanlinna & 
Liniger 1978-79, Dahlquist & Soderlind 1982), if 

||C(/i)|| G < 1 for Re/i<0 

where C(fi) is the companion matrix (7.13). 
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a) Prove that a method (g, a) is A -contractive for some positive definite matrix 
G , if and only if it is A -stable. 

b) Compute the contractivity region 

{M€C; ||C( M )|| g <1} 

for the 2-step BDF method with G given in (6.20). Observe that it is strictly 
smaller than the stability domain. 

Result The contractivity region is {/j G C ; Re fx < 0}. 


4. Give a direct proof for the statement of Lemma 7.9a. 
Hint Observe that 


KC,m) 


i 

<*k ~ vPk 


n 


1 


(7.63) 


where Cl (/A • • • ? CkM are ^ zeros °f q{() ~~ /icr(C) . If /i 0 € Int 5 then 

there exists a neighbourhood U of (j 0 such that |C(^)I < a < 1 for all i and 
H G Z7. Hence the coefficients r -(/i) are bounded. For G dS we have 
|C(/i 0 )| = 1 for, say, i = 1,..., £ with 1 < £ < k . These £ zeros are simple for 
all (jl in a sufficiently small neighbourhood U of /j 0 and the other zeros satisfy 
|C0u)| <a< 1 for ^ G fl £. A partial fraction decomposition 


KC,m) = 


OL l 




c ,(m) 


VGt 1 — C - Ci(A*)) 


+ 5 (C)M) 


shows that 

£ 

r j (m) = (7-64) 

where s ■ (/i) are the coefficients of s((, ju). Since the function s((, /i) is uni¬ 
formly bounded for |(| < 1 and /i G U fl S, it follows from Cauchy’s integral 
formula with integration along |(| = 1 that Sj(jj) is bounded. The statement 
thus follows from (7.64) and the fact that a finite set of the family {H} floeS 
covers S (Heine-Borel). 
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In Sect. V.6 we have seen a convergence result for one-leg methods (Theorem 6.10) 
applied to nonlinear problems satisfying a one-sided Lipschitz condition. An ex¬ 
tension to linear multistep methods has been given in Theorem 6.11. A different 
and direct proof of this result will be the first goal of this section. Unfortunately, 
such a result is valid only for A-stable methods (whose order cannot exceed two). 
The subsequent parts of this section are then devoted to convergence results for 
nonlinear problems, where the assumptions on the method are relaxed (e.g., A(a) - 
stability), but the class of problems considered is restricted. We shall present two 
different theories: the multiplier technique of Nevanlinna & Odeh (1981) and Lu- 
bich’s perturbation approach via the discrete variation of constants formula (Lubich 
1991). 

Problems Satisfying a One-Sided Lipschitz Condition 

Suppose that the differential equation y' = /( x, y ) satisfies 

Re (f(x, y) - f(x, z), y — z) < v\\y - z|| 2 (8.1) 

for some inner product. We consider the linear multistep method 
k k 

Y, a t y m+i= h Y,Pif( X rn+iiym+i) ( 8 - 2 ) 

*=0 *=0 

together with its perturbed formula 

k k 

^ ym-\-i ^ ^ > Pi f i X m-\-i ^m+i) ^m+fc ' (8*3) 

i=0 i=0 

The perturbations d m + k can be interpreted as the influence of round-off, as the 
error due to the iterative solution of the nonlinear equation, or as the local dis¬ 
cretization error (compare Eq. (7.5)). Taking the difference of (8.3) and (8.2) we 
obtain (for m > 0) 

k k 

^ v — ^ ^ ] Pi m+i 

i=0 i= 0 


(8.4) 
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where we have introduced the notation 

Ay, = y,- - y 5 , A fj = f(xj , y,-) - f(xj , y,-). (8.5) 

The one-sided Lipschitz condition cannot be used directly, because several A/- 
appear in (8.4) (in contrast to one-leg methods). In order to express one A f m in 
terms of A y- only we introduce the formal power series 

Ay(C) = £ Ay, O'. A/(C) = £ A/, O'. d(Q = £ d j (0 

j>0 j>0 j> 0 

It is convenient to assume that A y- — 0, A/ - = 0,^=0 for negative indices and 
that c? 0 ,..., d k _ 1 are defined by Eq. (8.4) with m G {— k ,..., -1}. Then Eq. (8.4) 
just compares the coefficient of ( m in the identity 

e(C x ) Ay(C) = MC -1 )A/(C) + C k d(0- (8.4’) 

Dividing (8.4’) by <t(( _ 1 ) and comparing the coefficients of yields 

m 

E S m-j Ay,- = hAf m + d m , (8.6) 

i =o 

where 

S = ‘(t)=E f / < 8 - 7 > 

vs ’ i>° 

as in (7.45) and 

-^A-d(o=d(o = E^d. (8.8) 

In (8.6) A f m is now isolated as desired and we can take the scalar product of (8.6) 
with A y m . We then exploit the assumption (8.1) and obtain 

m 

Ay 7n )<(ii/||Ay m || 2 +Re(5 m , A y m ). (8.9) 

i =o 

This allows us to prove the following estimate. 

Lemma 8.1. Let {Ay ■} and {A fj} satisfy (8.6) with 8 J given by (8.7). If 

Re (A/ m , AyJ < i/||Ay m || 2 , m > 0, 

and the method is A -stable, then there exist constants C and C 0 > 0 such that for 
mh < x end — x 0 and hu <C 0 , 
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Proof. We first reformulate the left-hand side of (8.9). For this we introduce {A 2 ■} 
by the relation 

k 

Y^PAzm+i = Ay m , m> 0 (8.10) 

1=0 

and assume that A z- = 0 for j < k. With Az(£) = . A z- this means that 

a(£ -1 )Az(C) = A y(Q . Consequently we also have 

^(C)A ! /(o = e(r 1 )A2 ; (c), 

which is equivalent to 

m k 

=1 ^2 a i ^ Z m+i' (8-11) 

j= 0 i=0 

Inserting (8.11) and (8.10) into (8.9) yields 

k k 

Re <£ 

i=0 z=0 

k 2 k 

< h ^Y^Pi Az m+i +R e (d m >YlPi Az rn+i)- 

i= 0 i=0 

By Theorem 6.7 the method (^, a) is G-stable, so that Eq. (6.21) can be applied. 
As in the proof of Lemma 6.9 this yields for A Z m = (A ^ m+A; _ 1 ,..., A z m ) T and 
v > 0 

l|AZ ro+1 || G <(l + C' 1 / lI /)||AZ ro || G + C 2 ||d ro ||, 

(if v < 0 replace v by v = 0). But this implies 

m 

iiA^ ro+1 ii G <c 3 (iiAz 0 ii G +x;iid;.|i). 

j= 0 

By definition of A z- we have A Z 0 = 0. The statement now follows from the fact 
that ||Aj/ m || < C 4 (||AZ m+1 || G + ||A£ m || G ). □ 

This lemma allows a direct proof for the convergence of A-stable multistep 
methods which are strictly stable at infinity (compare Theorem 6.11). 

Theorem 8.2. Consider an A-stable multistep method of order p which is strictly 
stable at infinity. Suppose that the differential equation satisfies (8.1). Then there 
exists C 0 > 0 such that for hv <C 0 

II y m - yi x m )II < C( o max \\yj -y{x 3 )\\ + hmax k || f{xj,yj) - «/'(* J )||) + Mh p . 

The constant C depends on the method and, for v > 0, on the length x m — x 0 
of the integration interval; the constant M depends in addition on bounds for the 
(p + 1 )-th derivative of the exact solution. 
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Proof. We put y m = y{x m ) in (8.3). The perturbations thus become the local 
truncation errors d m + k = $ LM (x m ), where 

k k 

3 lm( x ) = Yl a i y ( x + ih )- h ^2 Pi y'( x + ih )• (8.12) 

z =0 z =0 

If the zeros of a(Q all lie in the open unit disk, the coefficients of 
are absolutely summable and by ( 8 . 8 ) we have 

m m 

j =0 j —o 

The statement then follows from Lemma 8.1, from \\$ LM (x)\\ < Mh pJrl , and 
from the fact that d 0 ,..., d k _ 1 are linear combinations of the y-—y{x-) and 

h (f( x j,yj)-y'( x j)) forjcfc. □ 


Multiplier Technique 


... the best of all multipliers would be {1,-77} with a very small 
77 > 0; ... (Nevanlinna & Odeh 1981) 

The above convergence proof is based on Eq. ( 8 . 6 ) and on the A-stability of the 
multistep method. How can we modify this proof in order to get convergence 
results also for methods which are not A-stable? This can be done by the so-called 
“multiplier technique”, introduced by Nevanlinna & Odeh (1981) and based on 
previous ideas of Nevanlinna (1977) and Odeh & Liniger (1977). 

The main idea is the following: instead of multiplying scalarly the identity 
( 8 . 6 ) by A y m , we multiply it by 

m 

^A Pm-j A y 3 

3=0 

where {ju •} are the coefficients of a rational function (the multiplier) 

(*«) = ( 8 - 13 ) 

j> 0 ; 

(77 and r are polynomials). We obtain 

mm m 

R e ^ ^ m-j A Vj > Pm-j A yj) — ( K A fm> Pm-j A y j ) 

3=0 j= 0 j=0 

m 

(dmi ^A Pm-j A yj)' 

3=0 


(8.14) 
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Our next aim is to introduce new variables A z- such that the left-hand side of 
(8.14) becomes 


= E & A ^n+* )■■ (8-15) 

j—0 j= 0 i—0 i—0 

Denoting 


e(0 = Ez i C i , Z(0 = EM‘> 


i=0 


i=0 


(8.16) 


the identity (8.15) certainly holds, if 

0(C _1 ) Ay(C) = o-(C _1 ) <?(C _1 ) Az(C) 
viC 1 ) Ay(C) = r(C _1 )5 : (C _1 ) Az(C). 


Dividing these two relations motivates the following definition of the new generat¬ 
ing polynomials 

e(0 = <?(CMC)/x(C), cF(C) = o-(CMC)/x(0- ( 8 -i8) 

Here x(C) denotes the greatest common divisor of q(()t(() and cr(()rj((). If we 
define A z- = 0 for j < 0 and the remaining A z- by 

X(C _1 ) Ay(C) = (J (C _1 ) r (C _1 ) Az(C) (8.19) 


the identity (8.15) holds for all m . Suppose now that the multistep method (p, a) 
is A -stable, then the left hand side of (8.14) can be minorized by the G -stability 
estimate (6.21) and we shall be able to derive convergence results. This motivates 
the following 


Definition 8.3. The rational function p(Q of (8.13) is called a multiplier for (p, a) 
if p(() ^ ^(C —1 )/^(C -1 ) and if the method (p, 3Q, given by (8.18) is A-stable, 
i.e., if 


Re 


1 £(0 


>o 


for |(| > 1. 


( 8 . 20 ) 


A continuation of the above analysis yields the following convergence result. 
Lemma 8.4. Let {Ay^} and {A/-} satisfy (8.6) with 8 J given by (8.7). If 

N m 

E E ( A fm , A Vj) < 0 for all N> 0 

m=0 j=0 

and if p(() is a multiplier for the method, then there exists a constant C such that 
for mh < x end - x 0 

m 

\\ A y m \\<cE¥A 

3=0 
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Proof. Inserting (8.15) into (8.14) and using the estimate (6.21) for the A-stable 
method (g, a) yields for A Z m = (A z m+t _!,..., A z m ) T 


l|A^ m+1 || 2 G - \\AZ m f G < hRe(Af m , A y } ) 

j =0 

i 

+ ' | ^ z m+i • 

i =0 

Summing up this inequality from m — 0 to m = N gives 

n ~ 

\\^ Z N+i\\g < C, Y, IMmll • (l|AZ m+1 || G + ll A ^mllo)> 

m =0 

because A Z 0 = 0 by (8.19). This also implies 

M 

N<M W AZn +i\\g - 2Cl Y ll^mll- max ||AZ m+ i|| G . 
— m =0 — 


( 8 . 21 ) 


A division by max iV<M ||AZ iV+1 || G yields the desired estimate, because A y M is 
a linear combination of the elements of AZ M+1 . □ 


The proof of Theorem 8.2 applied to the A-stable method (g, cr) now yields: 

Theorem 8.5 (Nevanlinna & Odeh 1981). Consider a linear multistep method (8.2) 
of order p, which is strictly stable at infinity and has a multiplier p(Q. Suppose 
that the differential equation satisfies 

N m 

Y YVm-j Re (f( X m, U m)- f( X m’ V m)’ U j~ V j) <° ( 8 ' 2 2) 

m =0 j =0 

for all N > 0 and for all sequences {u^} and {vj}. Then we have 

\\ym-y( x m)\\\\yj-y( x j)\\+ h ™f* k \\f( x j^yj)-y'( x j)\\) +Mh p , 
where the constants C and M are as in Theorem 8.2. □ 


In the next two subsections we shall study the existence and construction of 
multipliers, and try to better understand the condition (8.22). 

Construction of Multipliers. Obviously p(() = 1 is a multiplier iff the method 
itself is A-stable. Moreover, the limit |(| oo in (8.20) shows that /i(0) must 
have the same sign as a k //3 k (which we always assume to be positive). Therefore, 
the simplest (and most important) nontrivial multiplier has the form 

KQ = 1 -nC- 


(8.23) 
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Suppose now that the method (g, a) is stable at infinity. The maximum principle 
for harmonic functions then implies that (8.23) is a multiplier for (g, a) iff \rj\ < 1 
and 


Re ^(1 — f]e lt ) 


a(e lt )) 


>0 


for all t G R. 


This condition motivates the study of 


7 W = 



(8.24) 


which is called the modified root-locus curve by Nevanlinna & Odeh (1981). We 
then have: 


Criterion 8.6. Consider a method which is stable at infinity. The function (8.23) is 
a multiplier for (p, a) iff ( 77 1 < 1 and the modified root-locus curve lies to the right 
of the straight line through the origin with slope — 1 /rj. 



Fig. 8.1. Modified root-locus curve for BDF schemes 


Fig. 8.1 shows the modified root-locus curves for the BDF schemes for 2 < k < 
6 . The optimal values for 77 are given in Table 8.1. 


Table 8.1. Multiplier for BDF schemes 


k 


arccos rj 

A(a) -stable 

2 

0 

7t/2 

7t/2 

3 

0.0836 

85.20° 

86.03° 

4 

0.2878 

73.27° 

73.35° 

5 

0.8160 

35.32° 

51.84° 

6 

5.0130 

— 

17.84° 


Proposition 8.7. If p(() is a multiplier for (p, a) and we have 

|argM(C)|<|-a for |(l < 1 (8.25) 

then the method is A(a) -stable. 
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Proof. Condition (8.20) together with (8.25) implies that 

'9(0' 


arg 


(im 

{*(()) 


> a for |(| > 1. 


But this condition implies A(a) -stability. 


□ 


A simple calculation shows that the multiplier (8.23) satisfies (8.25) with a = 
arccos r ]. For the BDF schemes we have included these values in Table 8.1 together 
with the a -values for linear stability. 


Multipliers and Nonlinearities 

We still have to investigate the problem under what conditions on the multiplier 
/i(() and on the function f(x, y) one has (8.22) for all sequences {u-} and {vj}. 
To get an idea of the nature of (8.22) we first look, following Nevanlinna & Odeh 
(1981), at the linear problem y f = Ay. 

Proposition 8.8. If the multiplier /i(() satisfies (8.25) and if the range of the matrix 
A lies in the sector | arg (Au, u) — tt\ < a for all u £ C n , then we have 

N m 

52 51 »rn-j Re ( Au m > U j ) < 0 (8-26) 

771 — 0 j = 0 

for all N >0 and all sequences {it ■}. 

Proof. A direct computation shows that the expression in (8.26) equals 

Re (iwJ 0 (8-27) 

where 

N 

UN( t ) = 52 e ~ l3tu j 

j=0 

denotes the Fourier transform of (it 0 , it 1? ..., u N ). The assumptions on /i(() and 
on A imply that the integrand in (8.27) has non-positive real part. This proves 
(8.26). □ 


Problems which satisfy (8.22) for some multiplier /i(() must also satisfy the 
one-sided Lipschitz condition (8.1) with v — 0 (this is seen by putting TV = 0 in 
(8.22)). A class of nonlinear problems, for which (8.22) holds, is given by the 
following perturbation result. 
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Proposition 8.9. Let f(x,y) = —Ay + Ag(x,y) where A is a symmetric and 
positive semi-definite matrix. With \\u\\ 2 A — u T Au suppose that 

\\g(x,y)-g(x,z)\\ A <L\\y-z\\ A . (8.28) 

Then Condition (8.22) holds if 

L • max \u (()\< min Re u((). (8.29) 

|C|=i^ vs;, “|d=i ; 

Remark. For the multiplier (8.23) Condition (8.29) is equivalent to 
L-(l + r]) < (1 -??). 


Proof. As in the proof of Proposition 8.8 we get for w- — u- — v- 


N m / ^ /» 2 tt \ 

~E Tl^rn-j Re ( Aw m ^ w j) = - Re {j^ J o ^(e* t )(Aw Ar (i),u; Ar (i))diJ 

/»27r N 

~ m °2n / = - m o^2( Aw ji w j) (8-30) 

w Jo j=o 


m —0 j —0 

< 


where m 0 = min Re p(e lt ). On the other hand, the inequality of Cauchy-Schwarz 
gives 


N rn 

Y Re { A (9( X rm U m)-9(x m ,V m )) , Y,Hm-j(. u j- v j)) 


m—0 j —0 

N \ 1/2 / N m 

<1 Y '(^3 || Y^m-jiUj-Vj) 

m =0 


(8.31) 
2\ V 2 


7 v ra=0 j=0 

The last term in (8.31) can be estimated as (for the moment put w- = 0 for j > N) 


N 


E' 




/ ^ || Z_✓ ' m -J 3 

m — 0 j =0 


< 

A 27r 


^ m>0 


j=0 


2 

<7t 


A 



2 

A 


N 

dt<M 2 Y 

j=o 


2 

A 


where M = max \p(e~ it )\. These estimates together with (8.28) show that the 
expression in (8.22) is majorized by 

N 

(L-M -m 0 )Y\\ u j- v j\\ 2 A- 

3=0 


This is non-positive if (8.29) holds. 


□ 
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Discrete Variation of Constants and Perturbations 

We now turn our attention to the perturbation approach of Lubich (1991), which ex¬ 
tends the ideas of Sect. V.7 (discrete variation of constants) to nonlinear problems. 
For this we consider nonlinear differential equations written in the form 

y'= Ay+ g{t,y). (8.32) 

Inserting this equation into Formulas (8.2), (8.3), and (8.4) we get 
k 

- hApJAy^ = hAg m+k + d m+k , (8.33) 

1=0 

where 

k 

A 9 m +k = ^Pi (9(x m+i ,y m+i ) - g{x m+i , y m+ f (8.34) 

2 — 0 

for m > 0. We further put A g- =0 for j < k. Recall that d- (for j > k) are 
usually the local truncation errors and d Q ,... ,d k _ 1 are defined by (8.33) with 
ra E {—1,..., — k}. The differences A y- are then the global errors of the method. 
If we introduce the formal power series 

Ay(C) = S A !/iC i , Ag(0 = J2 A ^’ d (0 = 5>;C J ' 

j> o i>o i>o 

then the recursion (8.33) can be written as 

Af/(() = r( (,hA) (hAg(() + d(()). (8.35) 

The resolvent r(C, hA) was introduced in (7.44) and (7.50). The coefficient of ( m 
in (8.35) then yields 

m m 

A Vm = h Yl r ™-j( hA ) A 9 j + '%2r m _ j (hA) d j . (8.36) 

j=o i=o 

The second sum on the right-hand side of (8.36) can be estimated as in Sect. V.7. 
In order to estimate the first term we have to combine estimates for r-{hA) with a 
Lipschitz condition for g(x, y) . This will lead to a Gronwall-type inequality, whose 
resolution gives the desired estimates for A y m . Let us illustrate this procedure in 
a simple situation. 

Theorem 8.10. Let the multistep method and the matrix A satisfy the assumptions 
of Theorem 7.10. If the nonlinearity g(x,y) satisfies 

\\g{x,y)-g(x,z)\\ < L\\y-z\\ (8.37) 

then there exist constants C, h 0 and 7 as in Theorem 7.10, and A (h Q and A 
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depend on L) such that 

2/mll 

< Ce Ax ™ (^max || y(xj) - Vj \\ +h p j ||y (p+ 1 ) (OII^) • 

Proof. It follows from the proof of Theorem 7.10 and from (8.36) that 

m m 

HAyJI <hLC 1 Y,e« m - i)h \\*y j \\ + C 2 Y i e* m - i) % (8-38) 

j —0 j =o 

where (with 0 < n < 1 ) 

= C 0 max || || + h* jT" !|t/ C ^ + 1 ) (OII^) • 

Application of Exercise 1 to the sequence || Ay m ||} yields the statement of 

the theorem. □ 


Lubich (1991) has shown how the above estimates can be improved to obtain 
convergence results for singularly perturbed problems (see Sect. VI.2) and for dis¬ 
cretized nonlinear parabolic equations, as we shall see in the sequel. 


Convergence for Nonlinear Parabolic Problems 


We consider the initial value problem 

y' + Ay = g(t,y), y(0)=y o (8.39) 

obtained by space discretization of a parabolic differential equation. The matrix A 
is assumed to satisfy for some a' £ (0, n/2) 

, M 

|| (s/+ A) || < --— for | arg s| < tt — a (8.40) 

1 |«s | 

(compare (7.47)). In order to motivate our assumptions on g(t , y) we begin with 
two examples. 


Burgers’ Equation. For this problem (Burgers 1948) 


U t + UU x =Fxx 

we consider the discretization 


or 


L i -\-1 u i-1 

4Ax 


Ut+ (y)* = ^ >0 ’ 

u i+1 ~2u i + u i _ 1 


+m- 


(Ax) 


4A;r 
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It is of the form (8.39) with 


1 

( 2 - 1 
-1 2 -1 

\ 


yl-vl 

\ 

cs 

II 

-1 2 

-1 

■ 9(( ' ! ' )= 4A* 

yl-v\ 

\yl+l-y 2 n- 




-1 2 / 


J 


(8.41) 

where fj> 0 is a given constant, Ax = l/(n + 1 ) and, due to homogeneous bound¬ 
ary conditions, y 0 = y n+1 = 0. In this situation we work with the scaled norm (on 
R n ) 

IMI = \J^ x Y, n i =1 kl 2 > (8-42) 


which tends to that of L 2 (0,1) for n —>• oo. As the eigenvalues of the symmet¬ 
ric matrix A are real and positive, condition (8.40) is verified for every a' > 0, 
uniformly in Ax . 

The presence of the denominator Ax in g(t,y) of (8.41) does not allow a 
Lipschitz condition (8.37) uniformly in Ax > 0 (not even in a neighbourhood of 
the exact solution). However, using the energy norm ||A 1 / 2 u||, which already 
contains the factor 1 / Ax , we show that 

\\g(t,y)-g{t,z)\\<P-- 1 'i'-\\A 1/2 (y-z)\\ for ||A 1 / 2 t/|[ + ||A 1 / 2 z|| < r. 

(8.43) 

For the proof of this relation we consider the bilinear map b :R n xR n ~^R n , 
whose i th component is defined by 


b t {u,v) = (4Az) \u i+1 +u i _ 1 ){v i+1 -t^j) 

(again we put u 0 = v 0 = u n+l = v n+l =0). Then 

g(t, y) - z ) = b (y, y) - H z , z ) = b (y, y- z ) + Hy- z , z ), 

and we need an estimate for ||6(tt, v) ||. Using 

IK+1 +w i _ 1 )(v i+1 <2 - \ v i +l I-maXj- |Wj|, 

and the estimates of Exercise 3 we obtain 

||6(w, u)|| < Hull^ • \\Dv\\ < • ||A 1 / 2 u|| • ||A 1/2 ?;||. 

where 

0 1 \ 

-1 0 1 

-1 0 

1 

-1 0 / 


D = 


2Ax 


(8.44) 


(8.45) 


(8.46) 


represents the first central difference operator. The estimate (8.45) applied to (8.44) 
proves (8.43). 
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Incompressible Navier-Stokes Equation. The motion of a viscous incompressible 
fluid in a domain £1 C R d is governed by the equations of Navier (1823) and Stokes 
(1845) 


du 

dt 


+E 


<9u 

l d^ 


= Au — gradp, 


div u = 0 


(8.47) 


where u = (u 1 ,... ,u d ) T . We denote by P the orthogonal projection from L 2 (Q,) d 
onto X , where X is the subspace of functions with div u = 0 (more precisely: 
the closure of the set of smooth functions with vanishing divergence and support 
contained in Q,). If we apply P to Eq. (8.47), gradp is eliminated and we obtain 


i =1 


a u 


;)■ 


(8.48) 


These equations are now precisely of the form (8.39), where A — —PA (or some 
discretization of it) and g(t,y) is the right-hand side of (8.48). Lipschitz estima¬ 
tions for this nonlinear term have been obtained by Sobolevskii (1959) and Fujita 
& Kato (1964). They are of the form 


\\g(t,u)-g{t,v)\\ /3 _ y < t(r) • ||u - v\\p for 
where || • denotes the norm 


< r (8.49) 


IMI/j = 11^11- (8-50) 

In particular, for d— 3, condition (8.49) is true for /? = 1/2, 7 > 3/4 as well as 
for [3 — 7 > 3/4 (Fujita & Kato 1964, pp. 272-273). 

Motivated by these examples we consider the initial value problem (8.39) on 
R n , where A is supposed to satisfy (8.40) for some a' E (0, tt/2 ) and the non¬ 
linearity g(t, y) is assumed to satisfy the Lipschitz condition (8.49). 

Application of a linear multistep method to (8.39) yields 

k k k 

E a * J/ rn+i+^E^ J/ ™+* ( 8 ‘ 51 ) 

z=0 z=0 z=0 

Instead of comparing the numerical solution {y m } with the analytic solution y(t) 
of (8.39), it is more interesting to compare it with the exact solution of the original 
partial differential equation. We therefore denote by 77 (f) a projection of the solu¬ 
tion of the PDE into the finite-dimensional space under consideration. In this way 
we obtain 

r] 1 + Ar] — g(t, rj) + s(t) 
where s(t) is the spatial discretization error. 
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Theorem 8.11 (Lubich 1991). Consider the problem (8.39) with A and g(t : y) 
satisfying (8.40) and (8.49) with 7 < 1, respectively. Assume that the multistep 
method is of order p, A(a) -stable for some a > a f , and strictly stable at infinity. 
Then, the full discretization error is bounded by 

bm -*7(011/? - r i( t j)Wp+ hP ll ? ? (P+ 1 ) WI \p di 

+ ||^4 _1 s( 0)|| / 3 + jf "* IIA-VMII, dtj . (8.52) 

The estimate holds for t rn — mh <T provided that h < h 0 and the expression in 
brackets on the right-hand side is bounded by e, where h 0 and e are sufficiently 
small. The constants C, h 0 and e depend on max 0<KT 11 77 (f) ||^ tmd M of 
(8.40), but are otherwise independent of A and the dimension of the system, and 
independent of m and h. 

Proof, a) The projected solution 77 (f) of the PDE, inserted into (8.51), gives 

k k 

E “■ ^ m + i ) = E A ? ^ m + i - + 5 ( W «')) + d m+k 

i= 0 t=0 

where ^ 

¥ m+ kh<C 0 h p h (p+1) m fi dt, m> 0. (8.53) 

The same analysis which was necessary for (8.36) now gives for the error A y rn = 
v(t m ) - y m . the relation 

m m m 

A V m =fl Yl r m - j (~ hA ) A 9 j + h J 2 r m - j{- hA ) As j + h ^ r m - j (~ hA ) d j - 
j = 0 j =0 j =0 

(8.54) 

As in (8.34) the quantities A g- and A s- are defined by 

k 

*=0 

k 

^ S m-\-k y y @ i S ^ m -\- i ) 

i=0 

for m > 0, and A g- = 0, A Sj = 0 for j <k. The values d 0 ,..., d k _ 1 are defined 
as usual (see their definition before (8.4’)). The following three parts of the proof 
treat the three terms in the right-hand side of (8.54) separately, 
b) The Lipschitz condition (8.49) can be written as 

\\ A ~~ i {9{t,y)-gO,z))\\ l3 <K r )-\\y- z \\i3 for IMI/j + IMI/? < r - 

We put g = max 0<KT || 77 (f) ||^ and assume that for hm < T the numerical solution 
y m exists and is bounded by \\y m \\ < g + 1 (this will be verified recursively in part 
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(f) of the proof) so that 

k 

\\A-^g m+k \\p < l{2 e +l) • Y, \Pi\ ■ ll A 2/m +l lk (8-55) 

2 = 0 

Consequently we have to find an estimate for \\r rn _-(—hA)A^\\p (for the ma¬ 
trix norm corresponding to the vector norm || • ||^; see Sect. 1.9). We note that 
\\r rn _-(-hA)A 1 \\p = \\A 1 r m _ j (-hA)\\ and recall that A 1 r-{-hA ) is the coef¬ 
ficient of (7 in the series for 

A^-hA)=A^ (S(C)I + hA)~ l 

In order to apply Lemma 7.11 we have to estimate $( 5 ) = A 1 (si + A )~ l . If A 
can be transformed to diagonal form with an orthogonal matrix (as it is the case for 
(8.41)), we have for | arg s\ <tt — a f (0 < a' < a) 

\\A y (sI + A)~ l 1| < sup r ——r <Mj -Isl 7-1 . 

a>0 I 5 + a \ 


For the general case we refer the reader to Henry (1981, pp. 26-28). Application of 
Lemma 7.11 (see also Remark 7.12) yields 

\\ rj (-hA)A^ < C.iU + 1 for j > 0 . 

Together with the Lipschitz condition (8.55) this gives with L = C l - £(2g + 1) 


H| E r m - 3 (- hA ) A 0 < ^ L £(m - j + 1)-* II A M/»- < 8 ' 56 ) 

j =0 t* 3=0 

c) The second term in (8.54) is the coefficient of ( m in 


MC, -hA)As(() = r{()As(Q 
where we have introduced 

f(C) = (sioi + hA )- 1 hASicr 1 = ^. c ; 

A S (0 = <5(0 A- 1 As(0 = E A 0 E 

j>0 

In order to estimate ||r-H^ (matrix norm) we note that \\rj\\p — ||?j ||. In view 
of an application of Lemma 7.11 we have to consider $(s) = (si + A ) -1 As “ 1 = 
5 —1 / — (si + A ) -1 which, because of (8.40), is bounded by (M + 1)/|. Lem¬ 
ma 7.11 thus yields \\r j \\p<C 2 . Further we have 

M0 = ^A-(A-^As k C k + Y A-1 (As J --Aa i _ 1 )C > ) 
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where the coefficients of 5(C) /(l — C) are absolutely summable, because the zeros 
of cr(C) lie all inside \(\ < 1 . Combining all these estimates we get 



<C 3 (||A- 1 A3 fc || /J + £ ||A- 1 (A* i -A a ,._ 1 )|| /J ) 

}=k+1 (8.57) 

<C 4 (\\A-'s(O)\\ 0 + \\A-' s'mpdt). 

d) The last term in (8.54) can be estimated in the same way as the correspond¬ 
ing term in the proof of Theorem 7.10. We just have to take the norm (8.50) and 
get 

HI £ r m-j(- hA ) d j 0 < C 5 („ “a* hj-v(tj)\\p + h p jf ||»? (, ’ +1) (*)ll i 8 d ^ ■ 

3 (8.58) 

e) Inserting (8.56), (8.57), and (8.58) into (8.54) gives 

m 

II &Vm Il/J < h'-t L ^2(m -j + 1 )~ 7 II Ayj\\ p + C 6 e m (8.59) 
j = 0 


where C 6 — max(C 4 , C 5 ) and e m denotes the expression in brackets on the right- 
hand side of (8.52). For h < h 0 and h\~ 1 L < 1 this Gronwall-type inequality can 
be solved (Exercise 2) and gives || A y m || < C 7 e m , the desired result. 

f) We now justify recurively our assumption ||y m ||^ < g + 1 used in (b). Sup¬ 
pose that || y- II/? < g+1 for j = 0,1,..., m — 1, then it follows from h 1 1 L < 1 
and the contraction mapping theorem that a unique solution y m of (8.54) exists. 
This solution verifies \\y m \\^ < |k(^ m )||/? + || ^y m \\y < g+ 1 if e is small enough, 
more precisely, if C 7 e < 1 . □ 


Remark. A different approach to convergence results of multistep methods for non¬ 
linear parabolic equations is given by Le Roux (1980). A corresponding theorem 
for Runge-Kutta methods is proved in Lubich & Ostermann (1993). 

Exercises 

1 . Let L> 0 and consider two sequences {it ■} and {e •} of nonnegative numbers 
which satisfy 

m m 

< hL Uj -j- £j for m > 0 . 
j —o j —o 
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Prove that for hL < 1 — C~ l 

m 

u m <Ce LCmh Y, £ r 

3 =0 

Hint. Show by induction that v m < hh o 1 v j + M implies 
v m < M( 1 + hA) m < Me Amh . 

2. Consider the inequality (8.59) with 7<l,L>0,£ m >0 and h > 0. Under 
the assumptions h <h 0 and < 1 prove that there exists a constant C 

such that ||Ay m || /? < Ce m for mh < T. 

Hint. Move the term h l ~ 1 L\\ to the left and divide the inequality by 

(1 — h 1 ~' y L ). This yields 

m — 1 

II||^ ^ (m—j)-' 1 IIAj/j-+ e for to > 0 . 

j=o 

Show that || Ay m || /? < eu(m/i), where u(z) is the solution of 

u(x) = 1L ( (x — t) r u(t)dt. (8.60) 

Jo 

Estimate the solution of (8.60) (see Henry 1981, pp. 188-190). 


3. Let A and D be the matrices of (8.41) (suppose (i — 1) and (8.46). Prove that 
for all u£R n 

a) || W || 00 <||A 1 / 2 W ||, b) \\Du\\ < \\A l ' 2 u\\, 

where IMI^ = max- |u-| and || • || is the norm of (8.42). 

Hint, a) Let u 0 = 0 and apply the inequality of Cauchy-Bunyakovski-Schwarz 
to u % = i ( u j - u j- 1 ) • This g ives 




: U T Au. 


3 = 1 


b) The inequality u T Au > ||Du|| 2 is a consequence of the algebraic identity 
O 0 = W n+1 =°) 


n n 

4 53( 2u i ~ U i U i +1 - U i U «-l) “ ) 2 

2=1 2=1 

n 

= - 2w ; + + 2w 2 + 2 m 2 . 

2=1 



V.9 Algebraic Stability of General Linear Methods 

General linear methods were originally introduced as a means of 
unifying and generalizing existing theories for traditional meth¬ 
ods. (J.C. Butcher 1987) 


In Sections IV. 12 and V.6 we have studied the nonlinear stability of Runge-Kutta 
methods (5-stability) and of one-leg methods (G -stability). It is natural to ask 
whether these theories can be combined within the class of general linear methods. 
This work was initiated by Burrage & Butcher (1980). 

We consider the differential equation y' = f(x,y) where y and / are complex¬ 
valued vectors and we assume the one-sided Lipschitz condition 

Re (f(x, y) — f(x, z), y — z)< v\\y -z\\ 2 . (9.1) 

General linear methods are defined by (see Example 8.5 of Sect. III.8) 

u i n+1) ^^2 a ij u t ) + h ^2 b i}f( x n + c j h > v j n) )> * = 1, - - -, fc (9.2a) 

3 = 1 3=1 

v i U) = Ys % u l U) + h J2^ijf( x n + c j h , v j n) )i i = 1, • • •, 5. (9.2b) 

3=1 3=1 

Here, u n = ( u ..., u^) T contains the necessary information from the pre¬ 
vious step. The internal stages (v[ n \ ..., , defined by (9.2b), serve for the 

computation of u n+1 in (9.2a). 

G -Stability 

As in Sect. V.6, we consider inner product norms 

ikiIg =J2 J29ij( u i n) (9 - 3) 

i= 1 j=1 

where G = ( ; g i •) is a real, symmetric and positive definite matrix. 

Definition 9.1. The general linear method (9.2) is called G-stable, if there ex¬ 
ists a real, symmetric and positive definite matrix G, such that for two numerical 
solutions {u n } and {u n }, 

IK+i-^+iIIg < II«„-“Jg ( 9 - 4 ) 

for all step sizes h > 0 and for all differential equations satisfying (9.1) with v = 0. 
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For Runge-Kutta methods (where k — 1 and apart from a scaling factor G = 
(1)) this definition reduces to B -stability as introduced in Definition IV. 12.2. For 
one-leg methods (where 5 = 1 and u n — (y n+A ,_ 1? ..., y n ) T ) it is equivalent to 
Definition 6.3. 

Many methods can be written in different ways as general linear methods and 
the above definition of G -stability may depend on the particular formulation. For 
example, the trapezoidal rule 

Vn+l =yn + \ ( f( X n , Vn) + /(*„+! . Vn+l )) 

can be considered as a Runge-Kutta method (with u n — y n ). In this case it is not 
G -stable (because it is not B -stable, see Theorem IV. 12.12). However, if we let 
u n = (y n , hy' n ) where y' n = f(x n , y n ) , then the trapezoidal rule satisfies (9.4) with 

G =(l/2 %)■ <«> 

This follows from the fact that whenever {y n } is the solution obtained by the trape¬ 
zoidal rule, then z n — y n + f y' n is a solution of the implicit midpoint rule, which 
is known to be B -stable (see Example IV. 12.3 or Theorem IV. 12.9). Therefore 

ll^/n+l 2 ^rc+lll — II Un 2 ^nW 

which proves the statement. The matrix G in (9.5) is singular and thus not strictly 
positive definite. Burrage & Butcher (1980), however, admit non-zero non-negative 
definite matrices G in their definition of G -stability (which they call monotoni¬ 
city). Therefore the trapezoidal rule is G -stable in their definition. 


Algebraic Stability 


In addition to (9.2) we consider a second numerical solution (marked with hats) 
produced by the same method using different starting values. We denote the differ¬ 
ences by 

Au- n) = w- n) — u- n) , 

A^ n) =v\ n) -v\ n \ 


Au n = U n-Ur 
rO) 


= h (fi x n+ c i h , v i n) ) ~ f( x n + c i h M n) ))- 

The following lemma states an identity which will be essential in the study of te¬ 
stability. 


Lemma 9.2 (Burrage & Butcher 1980). Let G be a real , symmetric matrix and 
D — diag ( d 1 ,..., d s ) be a real diagonal matrix. The difference of two solutions 
of (9.2) then satisfies 

ll Aw n+lllG-H Aw nllG = 2 Zl^ Re ( A /.' n) > Au «' n) )- m ij( W i’ W ]) 

i= 1 hj— 1 
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where (w x ,..., w s + k ) = ..., A u^ k \ A f[ n \ ... , A/i^) and the matrix 

M = (m-) is given by 

w ( G — A T GA A T D — A T GB \ 

M=[ ~ ~ ~ . (9.6) 

\DA-B T GA DB -f B T D — B T GB ) 


Proof. We consider the identity 

II & U n+l\\G ~ ll Aw nllo - 2 d * Re ( A /l n) > Aw i n) ) 

Z =1 

= £ 9 „{a u !” + '>,a„;"+'')- £ <,.,{a u <">,a»«) 

*J=i 

- ■td,(Af i r>, a»<">> - £ d,(A»i"', a/;- 1 ) 

z=l z=l 

and insert the formulas (9.2). This gives 

k k s k s 

a ze Au< e n) + h Y, b u A fe n) > a jt Au< e n) + h J2 b it A A 


(n) 


i,j= 1 £=1 

k 


1=1 1=1 
k 


£=1 


hj = 1 


- J2 ri) ’^XT 


(n) 


z—1 1=1 

s k 


£=1 


A ^- n) 

Z =1 1=1 1=1 

Multiplying out and collecting suitable terms proves the statement. 


□ 


Definition 9.3. The general linear method (9.2) is called algebraically stable , if 
there exist a real, symmetric and positive definite matrix G and a real non-negative 
definite diagonal matrix D , such that the matrix M of (9.6) is non-negative defi¬ 
nite. 


An immediate consequence of our assumption (9.1) with v — 0 and of Lem¬ 
ma 9.2 is the following result. 

Theorem 9.4. Algebraic stability implies G-stability. □ 


For a given method it may be difficult to find matrices D and G such that M 
of (9.6) is non-negative definite. The following lemma shows some useful relations, 



V.9 Algebraic Stability of General Linear Methods 359 

which hold if the method is assumed to be preconsistent, i.e., if there exists a vector 
such that 

M 0 =£ 0 , = l (9.7) 

(cf. Eq. (8.25) of Sect. III. 8 ). 

Lemma 9.5. If a general linear method is preconsistent and algebraically stable, 
then the matrices D and G satisfy 

i) (d 1 ,...,d s )T = Dl = BTGt 0I 

ii) (/ — A T )G( > 0 = 0, i.e., G £ 0 is a left-eigenvector of A corresponding to the 
eigenvalue 1. 

Proof i) Let 77 £ and e £ R be arbitrary. The non-negativity of M, given by 
(9.6), implies 

(£ 0 W)m(^°) >0 

so that 

$(G - A T GA)i 0 + 2ei 1 T {DA - B T GA)£ 0 + e 2 -q T (DB + B T D - B T GB)rj > 0. 

Since the e -independent term vanishes (due to A£ 0 = £ 0 ), the coefficient of e must 
be zero and since this holds for all rj , the result follows, 
ii) A similar argument applied to 

(£ 0 +££i) T (G-A T GA)(£ 0 +e£i) >o for all ^ e R*, e e R 
implies the second statement. □ 


AN -Stability and Equivalence Results 

It is interesting to study in which situation algebraic stability is also necessary for 
G -stability. For this we consider the differential equation 

y f — \(^x)y with ReA(x)< 0 . 

If we apply the general linear method (9.2) to this problem, we obtain 



U n+1 =S(Z)u n 

(9.8) 

where Z = diag ,. 

.., z 3 ), z- = h\(x n + Cjh) and 



S(Z) = A + BZ{I-BZy'A. 

(9.9) 


In the sequel we assume that the abscissae c ■ are related to the other coefficients 
of the method by (see also Remark III.8.17) 

( c i > • • • i C S ) T = c = + Bl, 


(9.10) 
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where ^ 6 R k is the second coefficient vector of the exact value function 
z(x, h) = y(x)£ 0 + hy'(x)(, 1 + 0(h 2 ). 


This means that the internal stages approximate the exact solution as 
= y{x n + Cjh) + 0{h 2 ). 


Definition 9.6. A general linear method is called AN - stable , if there exists a real, 
symmetric and positive definite matrix G such that 


\\S{Z)u\\ G < \\u\\ G 


for all Z = diag (z 1 ,..., z s ) satisfying Re z- < 0 
(j — 1 ,..., s) and Zj = z k whenever c • = c k . 


Other possible definitions of AN -stability are given in Butcher (1987). For ex¬ 
ample, if the condition ||5(Z)tt|| G < ||tt|| G is replaced by the powerboundedness 
of the matrix S(Z ), the method is called weakly AN -stable. This definition, how¬ 
ever, does not allow the values z ■ — h\(x n -\-c-h) to change at each step. Another 
modification is to consider arbitrary norms (instead of inner product norms only) in 
the definition of AN -stability. Butcher (1987) has shown that this does not lead to 
a larger class of AN -stable methods, but makes the analysis much more difficult. 

We are now interested in the relations between the various stability definitions: 
the implications 

algebraically stable ==> G -stable ==> A A-stable => A-stable 

are either trivial or follow from Theorem 9.4. We also know that A-stability does 
not, in general, imply AN -stability (see e.g., Theorem IV. 12.12). The following 
result shows that the other two implications are (nearly always) reversible. 

Theorem 9.7 (Butcher 1987). For preconsistent and non-confluent general linear 
methods (i.e., methods with distinct c ■) we have 

algebraically stable <=> G-stable AN-stable. 


Proof. It is sufficient to prove that AN -stability implies algebraic stability. For 
this we take the matrix G , whose existence is known by the definition of AN - 
stability, and show that the matrices D and M, given by Lemma 9.5i and (9.6), 
are non-negative definite. 

In order to prove dj >0 we put z- = —e (e > 0) and z k — 0 for j . We 
further let A u n — £ 0 (the preconsistency vector of (9.7)) and A = z £ A^ n ^ , 

so that A u n+1 = S(Z)£ 0 and A^ n ^ = 1 + 0(e). Using 

mQ=0, (9.11) 

which is a consequence of Lemma 9.5, the identity of Lemma 9.2 yields 
\\Sm 0 r G -M 0 \\l = -2ed J+ O(e 2 ). 
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Since the left-hand side of this equation is non-positive by AN -stability, we obtain 

We next put z t — ier] e where rj = ( 77 ^..., rj s ) T E is arbitrary and e is a 
small real parameter. We further put Au n = £ 0 + iefi with p E and A = 
z t Av\ n ^ . This again implies Av^ = 1 + 0(e) . The identity of Lemma 9.2 to¬ 
gether with (9.11) gives 

I|S(%IIg - IKoIIg = -(Co - ie V + 0(e 2 ))M( i£ ^ 2) j = 

= - £ 2 ( m ) t mQ+o( £ ! ). 

Since this relation holds for all p and rj , the matrix M has to be non-negative 
definite. □ 


Example 9.8. Let us investigate the G -stability of multistep collocation methods 
as introduced in Sect. V.3. We consider here the case k — 2 and 5 = 2, and fix one 
collocation point at c 2 = 1. The method is then given by 

A 


(y n +1 

V y n 


( 1-V5 1 ) VUA ( Vn \ 

V 1 o ) \y n -i) 


+ h 


V’i(i) A( l ) 
0 0 


(l~ip( cl ) lp{ cl ) 

V 1 — 95(1) A 1 ) 


y n A 

V n — 1 ) 


f( x n+ c l h , v l)\ 

f(x n + h ,v 2 ) ) 


A 


+ h 


i’A c i) A( c i) 

^ 1 ( 1 ) Ai 1 ) 


/(•'•„ 1 r ! i 
f(x n + h ,v. 2 ) 


(9.12) 


B 


where 


ft / 7*3 7*2 \ 

m( X ) — -(-— (1 _j_ c A _j_ xc 1 ) 

’ 5 + 9tj V 3 2 1 i; V 

A (A = n ( 5 - 3x ) 


Ai x ) = 


(1 - Cj)(5 + 9cj) 

x(x + 1) 

(l- Cl )(5 + 9 Cl ) 


((^ c i A -0 ,T ~ c \ (3c ;l + 2)). 


We know from Exercise V.3.7 that the method is A-stable if and only if c x > 
(VT7 — l)/8. For the study of its G -stability we assume that after an appropriate 
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scaling of G, g lx = 1. By Lemma 9.5ii the matrix G must then be of the form 
(recall that £ 0 = ( 1 , 1 ) T ) 


G = 


1 7 — 1 

7-1 (y>(l)-1)7 + 1 


(9.13) 


A necessary condition for G to be positive definite is that det G > 0. For c 1 > 0 
this is equivalent to 


0 < 7 < 


6 (l + c i) 
5 + 9c x 


(9.14) 


Next we use Lemma 9.5i which implies that 


7 — 77 C 1 )» 7=77(1)- (9-15) 


Inserting (9.13) and (9.15) into the matrix M of (9.6) yields for its lower right 
block 


f 7(1) 0 \ ( 2 7Xi~1 (x 2 + 1 )7-1\ /'V’i(l) 0 \ 

V 0 ^(1)7 7x2 + 1)7-1 2 7 -l J\ 0 7(1 )) 

(9.16) 

where 


77) 
Xl 7(i) 



c i + l)(5-3ci), 


= 77) = c i( c i + i) 2 

X2 7(1 ) 2(3cj — 1 ) 


A direct computation (see Exercise 2) shows that this 2x2 matrix can not be non¬ 
negative definite for <7 > (\/l7 — l )/8 and 7 satisfying (9.14). Consequently the 
considered methods are never G -stable. 

In the next subsections we shall show how high-order algebraically stable gen¬ 
eral linear methods can be constructed. 


Multistep Runge-Kutta Methods 


An interesting extension of multistep collocation methods are the so-called multi- 
step Runge-Kutta methods. They are defined by the formulas 

k s 

Vn+l = Y 7+n+l-J + h Y b jK X n + C J h > ^ 

3= k J= s (9 - 1?) 

^ = Y^ii y n+l-i + k YYf( X n + c 7 

3 — 1 J = 1 


They obviously form a subclass of the general linear methods (9.2). This is seen 
by putting u n = {y n ,y n -i , • ■ •, y n - k+1 ) T so that the exact value function is 

z(x,h) = (y(x),y(x - h),... ,y(x-(k - 1 )h)) T . 
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Further, the matrices A and B have the special form 



/“i •• 

• ••• «k\ 


A • 

•• \\ 


i 

0 

, B = 

o . 

.. 0 

A = 




1 0 / 


\o . 

.. 0 / 


(9.18) 


The order conditions for such methods were derived in Theorem III.8.14. It follows 
from this theorem that the method (9.17) is of order p, iff 

k s 

1 = J2 a j( 1 -j) e{t) +'}2 b j y 'j( t ) for t€T, g(t)<p. 

j= 1 3 = 1 

The values v'-(f) are given recursively by 

k s 

v *co = E%-( 1 -j) t(,)+ EViW- 

3 = 1 3 = 1 

Recall from Corollary II. 12.7 that 

v'(0)=O, v'(r) = 1 

The order conditions (9.19) constitute a system of nonlinear equations in the co¬ 
efficients of the method. Without any preparation, solving them may be difficult. 
We therefore introduce additional assumptions which simplify the construction of 
multistep Runge-Kutta methods. 


(9.19) 


(9.20) 


(9.21) 


Simplifying Assumptions 

The conditions B(p ), C(tj) and D(£) of Sect. IV.5 were useful for the construc¬ 
tion of high-order implicit Runge-Kutta methods. Burrage (1988) showed how 
these simplifying assumptions can be extended to general linear methods. In the 
sequel we specialize his approach to multistep Runge-Kutta methods. We consider 
the assumptions 


s k 


B(p): 

3 = 1 3= 1 

s k 

4=1,- 


C{rj) : 

- j)’= c > 

j=i j=i 

q = l, • 

.., r ], all i 

d a (0-- 

2=1 

9 = l,. 

alii 

^b (0 : 

2=1 

g = l,. 

• •, € , all i 
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Condition B(p) is equivalent to the order conditions (9.19) for bushy trees. Con¬ 
dition C(rj) means that v •(£) , defined by (9.20), satisfies 

v j( t ) = c f t) for s{t)<V- (9.22) 

We remark that the preconsistency condition (9.7) with £ 0 = (1,..., 1) T , 
k k 

X^ Q J = 1 ’ = 1 for * = (9.23) 

j= 1 J = 1 

is obtained by putting q — 0 in B(p) and C(rj ). The condition £)(£) for Runge- 
Kutta methods splits into D A (£) and D B (£). However, under certain assumptions 
one of these conditions is automatically satisfied. 

Lemma 9.9. Suppose that the coefficients c 1 ,..., c s of a multistep Runge-Kutta 
method are distinct and 6 • 0. Then, 

i) B{t + k- 1 ), C{k - 1 ), d b (0 =► D a (0, 

ii) B(t + 3),C(s),D A {Q=>D B {Z), 

iii) B(ri + s), D A (s), D B (s) =>C{rj). 

Proof The first two implications are a consequence of the identity 

j =i *=i 

=- 1 E iz h i c i~% - M 1 - c p) c U 

j =1 i=l 

which holds under the assumptions C(£) and B(q + £). The last implication can 
be proved similarly. □ 


The fundamental theorem, which generalizes Theorem IV.5.1, is 

Theorem 9.10 (Burrage 1988). If the coefficients of a multistep Runge-Kutta 
method (9.17) satisfy the simplifying assumptions B(p), C(rj), D A (£), D s (£) 
with p < rf + £ + 1 and p < 2 77 + 2 , then the method is of order p. 

Proof The conditions C(rj) and D A (£) 9 D B (£) allow the reduction of order 
conditions of trees as sketched in Fig. 7.1 and Fig. 7.2 of Sect. II.7, respectively. 
Under the restrictions p < 77 + £ + 1 and p < 2r? + 2 all order conditions reduce to 
those for bushy trees which are satisfied by B(p). □ 


Remember that we are searching for high-order algebraically stable methods. 
Due to the Daniel-Moore conjecture (Theorem V.4.4) the order is restricted by 
p < 2s. It is therefore natural to look for methods satisfying B(2s), C(s) and 
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D a (s ), D b (s) . They will be of order 2,s by Theorem 9.10 and are an extension 
of the Runge-Kutta methods based on Gauss quadrature. Let us begin by studying 
the condition B(2s). 


Quadrature Formulas 

Because of (9.23) condition B(p) of the preceding subsection is equivalent to 

5 k -i 

f( x ) dx ’ deg f <p—l, (9.24) 

j= 1 3 = 1 

where / stands for a polynomial of degree at most p — 1. For the construction of 
such quadrature formulas it is useful to consider the bilinear form 

k /* 1 I* 1 

(f’9} = Yl a ) f( x )d( x ) dx = I u(x)f(x)g(x)dx, (9.25) 

j—\ ~j Jl — k 

where uo(x) is the step-function sketched in Fig. 9.1. Under the assumption 

a k > 0, ct k + a k—i — 0 , • • •, cx. k + ... + oq > 0, oc k + ... + oq = 1, (9.26) 

lj(x) is non-negative and (9.25) becomes an inner product on the space of real 
polynomials. We call the quadrature formula (9.24) interpolatory if B(s) holds. 
This implies that 

b ,= f ^{xY^dx, ^(®)=n r —H- ( 917 ) 


1 


a l 


1 


a k + a k-l 
a k 1 - 


I_i_l___ i _ i 1 

1 — k 1 — fc + 11 — fc + 2 ... —1 0 1 

Fig. 9.1. Weight function for the inner product (9.25) 


The following results on Gaussian quadrature and orthogonal polynomials are 
classical. 
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Lemma 9.11. Let M(x) — (x — c x ) •... • (x — c s ). An interpolatory quadrature 
formula satisfies B(s + m) if and only if 

k 

3 = 1 



/ 

Jl-3 


M(x)x q 1 dx = 0 for q — 1 , 


□ 


Let p 3 (x) be the polynomial of degree 6 which is orthogonal with respect to 
(9.25) to all polynomials of degree s — 1. Lemma 9.11 then states that a quadra¬ 
ture formula (9.24) is of order 2s iff M(x) is a scalar multiple of p 3 (x). The 
polynomials p s (x) which depend on a 1 ,..., a k via the bilinear form (9.25) can 
be computed from a standard three term recursion 


Poi x ) = 1 , Pi( x ) = x ~Po 

Ps+l( X ) = ( X - Ps)Ps( X ) - lsPs-l( X ) 

where 

a _ ( X Ps>Ps) (Ps’Ps) 

3 (Ps,Ps) ’ 3 {Ps-l,Ps-l}' 


(9.28) 

(9.29) 


Obviously this is only possible if (pj,Pj) ^ 0 for j — 1,..., s . This is certainly 
the case under the assumption (9.26). 


Lemma 9.12. If ,..., a k satisfy (9.26) then all zeros of p s (x) are real , simple 
and lie in the open interval (1 — fc, 1 ). □ 


For the construction of algebraically stable methods, quadrature formulas with 
positive weights will be of particular interest. Sufficient conditions for this property 
are given in the following theorem. 

Theorem 9.13. If the quadrature formula (9.24) is of order p > 2s — 1 and if 
a l ,... , a k satisfy (9.26), then 

6 - > 0 for i = 1,. .., s. □ 


Algebraically Stable Methods of Order 2s 

... the analysis of the algebraic stability properties of multivalue 
methods ... is not as difficult as was generally thought ... 

(Burrage 1987) 

Following Burrage (1987) we consider the following class of multistep Runge- 
Kutta methods. 

Definition 9.14. Let a 1 ,..., a k with ^ a 3 — 1 a k 7 ^ 0 be given such that 
the zeros c x ,..., c s of p s (x) (Formula (9.28)) are real and simple. We then denote 
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by E(a 1 ,... , a k ) the multistep Runge-Kutta method (9.17) whose coefficients are 
given by 

k » 1 

b i = 'E a i £ i (x)dx, 
j=i 

1 

£ i (x)dx J 
-3 
l 

£ i (x)dx, 

where £fx) is the function of (9.27). 

The definitions of c t and imply B(2s) by Lemma 9.11. The formulas for 
a- and b- are equivalent to D A (s) and D B (s), respectively. Lemma 9.9iii thus 
implies C(s) and Theorem 9.10 finally proves that the considered methods are 
of order 2s. The following theorem gives sufficient conditions for the algebraic 
stability of these methods. 

Theorem 9.15 (Burrage 1987). If a j > 0 for j = 1 ,..., k, then the method 
E(a 1 ,... , a k ) is G-stable with 

G = diag (1, a 2 + . .. + a kl . .. , a fc _ i + a k ,a k ). (9.30) 



z = l,...,s, 
z = 1 ,. . . , s ; j = l,...fc 
Z = 1, • •. ,S ; j = 1,... s 


Proof For multistep Runge-Kutta methods the preconsistency vector is given by 
£ 0 = (1,1,...,1) T . With the matrix G of (9.30) it therefore follows from Lem¬ 
ma 9.5 that 

d t = b t for z = l,...,s. (9.31) 

By Theorem 9.13 this implies d i > 0 so that the first condition for algebraic stability 
is satisfied. In order to verify that the matrix M of (9.6) is non-negative definite, 
we transform it by a suitable matrix. We put 

^=(cr 1 ).. and a = («!,... (9.32) 

A straightforward calculation using the simplifying assumptions D A (s), D B (s) 
and B(2s) shows that 

(o vr) M (l v) = (l w) (933) 

where 

w=( 1 -(i- i y) t=1 _ k 

and the 2k x 2k matrix M is given by 


M = 


Z Z 

z z 


Z = diag (ckj, .. . , a k ) - aa r . 


(9.34) 
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Since a - > 0 and ^2 a - — 1 it follows from the Cauchy-Schwarz inequality that 



Therefore the matrix Z , and hence also M , are non-negative definite matrices. 
This completes the proof of the theorem. □ 


One can ask what are the advantages of the methods E(a 1: ... ,a k ) with k > 1 
over the 5 -stage Gauss Runge-Kutta methods of order 25. All these methods have 
the same order and are algebraically stable for a ■ > 0. 

• The Gauss methods have a stability function whose value at infinity satisfies 
|i2(oo)| = 1. In contrast, the new methods allow the spectral radius ^(5(oo)) to 
be smaller than 1, which improves stability at infinity. For example, numerical 
investigations of the case 5 = 2, k — 2 show that g(S(oo)) has the minimal value 
y/2 — 1 ~ 0.41421 for a x = 12y/2 — 16 and a 2 = 1 — aq (see Exercise 7). There 
are some indications that L -stable methods do not exist: if we could find methods 

(n) 

with an internal stage, say v K s , equal to y n+1 , then the method would be L -stable. 
Unfortunately, this would imply c s = 1, which is in contradiction to Lemma 9.12 
and to a ■ > 0. 

• The eigenvalues of the Runge-Kutta matrix of the Gauss methods are com¬ 
plex (with the exception of one real eigenvalue, if 5 is odd). Can we hope that, 
for a suitable choice of a- > 0, all eigenvalues of B become real? Numerical 
computations for 5 = 2 and k = 2 indicate that this is not possible. 


B - Convergence 


Many results of Sections IV. 14 and IV. 15 have a straightforward extension to gen¬ 
eral linear methods. The following theorem corresponds to Theorems IV. 14.2, 
IV. 14.3, and IV. 14.4 and is proved in the same way: 

Theorem 9.16. Let f be continuously differentiable and satisfy (9.1). If the matrix 
B of method (9.2) is invertible and if 

hu < a 0 (B~ 1 )^ 

then the nonlinear system (9.2b) has a unique solution. □ 


The next results give estimates of the local and global errors. We formula¬ 
te these results only for multistep Runge-Kutta methods, because in this case the 
definitions of C ( 77 ) and B(p) are already available. In analogy to Runge-Kutta 
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methods we say that method (9.17) has stage order q, if C(q) and B(q) are satis¬ 
fied. Recall that for the definition of the local error 

5 h( x ) = Vi -y{x + h) 

one assumes that y i — y(x + ih) for i = 1 — fc,..., 0 lie on the exact solution. 

Theorem 9.17. Suppose that the differential equation satisfies (9.1). If the matrix 
B is invertible, if a 0 (^~ 1 ) > 0 and if the stage order is q, then the local error of 
method (9.17) satisfies 

\\S h (x)\\<Ch q+1 max ||y (?+1) (£)ll for hv < a < a 0 (B _1 ) 

£ £ [a; — (k — l)h,x-{-h] 

where C depends only on the coefficients of the method and on a. □ 


This result, which corresponds to Proposition IV. 15.1, is of particular inter¬ 
est for multistep collocation methods, for which the stage order q = s + k — 1 is 
maximal. The global error allows the following estimate, which extends Theo¬ 
rem IV. 15.3. 

Theorem 9.18. Suppose , in addition to the assumptions of Theorem 9.17, that the 
method (9.17) is algebraically stable. 

a) If v > 0 then the global error satisfies for hv < a < a Q (B~ x ) 

l|yn-y(*n)ll<fe* eCMXn ff- 1 c 2 max ||y( ?+ 1 )(a:)||. 

L' 1 V x£[xo,x n \ 

b) If v < 0 then (for all h > 0 ) 

\\Vn ~ y( X n)\\ ^ h9 ( X n ~ X 0 ) C 2 “ax ||y ( « +1) (x) ||. 

x€[X 0 ,X n \ 

The constants C x and C 2 depend only on the coefficients of the method and (for 
case a) on a. □ 


In contrast to the results of Sect. IV. 15 the above theorem holds only for a 
constant step size implementation. 
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Exercises 


1. Show that for Runge-Kutta methods, where A = ( 1 ), A = 11, both definitions 
of algebraic stability (IV. 12.5 and V.9.3) are the same. 

2. Prove in detail the statement of Example 9.8, that the 2-step 2-stage collocation 
methods with c 2 = 1 (and c 2 ^ 1 ) are not G -stable. 

Hint. The non-negativity of the matrix (9.16) implies 7 > 1/2 and by consid¬ 
ering its determinant, 

7(4Xi - (1 + X 2 ) 2 ) > 2(xi - X 2 )- 
This inequality contradicts (9.14). 

3. If a multistep Runge-Kutta method with distinct c- and c , > 0 satisfies the 
assumptions B(s + k + f) and C(s + k - 1 ), then it also satisfies £> B (£). 

Hint. Show that 

( q Yl b i c V%j - “jt 1 - (! - i) 5 6 7 )) (K 1 ) - K 1 - 3 )) = 0 

j =1 V i =1 ' 

for all polynomials r(x) of degree < 5 + k — 1 which satisfy r(cj) = ... = 
r(c 5 ) = 0. For given j , construct such a polynomial which also satisfies 

r(l—j) = l, r(l —z) = 0 for z = and i i 1 j • 


4. Disprove the conjecture of Burrage (1988) that for every k and 5 there exist 
zero-stable multistep Runge-Kutta methods of order 2s + k — 1. 

Hint. Consider the case 5 = 1 so that these methods are equivalent to one-leg 
methods and consult a result of Dahlquist (1983). 


5. (Burrage 1988). Show that there exists a zero-stable multistep Runge-Kutta 
method with s = 2 and k — 2 which is of order 5. 

Result. c 12 = (y/7 ± \/2)/5 


6 . (Stability at infinity). If a multistep Runge-Kutta method satisfies D A (s) and 

D b (s) then we have, e.g., for 5 = 2 and k — 2, 


5(oo) 


a l a 2 
1 0 


1 lWl-c, l-c 2 \ 1 fa. 2a 2 \ 

0 0 ) (l-c? 1 -c\) { a 1 0 ) ■ 


Formulate this result also for general 5 and k. 


7. Verify that for the method E(a 1 , a 2 ) with 0 < a l < 1, a 2 = 1 — a l , the spec¬ 
tral radius g(S( 00 )) is minimal for a l = 12 a/2 — 16. 



Chapter VI. Singular Perturbation Problems 
and Index 1 Problems 





Singular perturbation problems (SPP) form a special class of problems containing 
a parameter e . When this parameter is small, the corresponding differential equa¬ 
tion is stiff; when e tends to zero, the differential equation becomes differential 
algebraic. This chapter investigates the numerical solution of such singular pertur¬ 
bation problems. This allows us to understand many phenomena observed for very 
stiff problems. Much insight is obtained by studying the limit case e — 0 (“the 
reduced system” or “problem of index 1”) which is usually much easier to analyze. 

We start by considering the limit case e — 0. Two numerical approaches - 
the e -embedding method and the state space form method - are investigated in 
Sect. VI. 1. We then analyze multistep methods in Sect. VI.2, Runge-Kutta meth¬ 
ods in Sect. VI.3, Rosenbrock methods in Sect. VI.4 and extrapolation methods in 
Sect. VI.5. Convergence is studied for singular perturbation problems and for semi¬ 
explicit differential-algebraic systems of “index 1”. 



YI.l Solving Index 1 Problems 


Singular perturbation problems (SPP) have several origins in applied mathematics. 
One comes from fluid dynamics and results in linear boundary value problems 
containing a small parameter e (the coefficient of viscosity) such that for e -» 0 
the differential equation loses the highest derivative (see Exercise 1 below). Others 
originate in the study of nonlinear oscillations with large parameters (van der Pol 
1926, Dorodnicyn 1947) or in the study of chemical kinetics with slow and fast 
reactions (see e.g., Example (IV. 1.4)). 


Asymptotic Solution of van der Pol’s Equation 

The classical paper of Dorodnicyn (1947) studied the van der Pol Equation (IV. 1.5’) 
with large /i, i.e., with small e . The investigation becomes a little easier if we use 
Lienard’s coordinates (see Exercise 1.16.8). In Eq. (IV. 1.5’), written here as 

ez" + (z 2 - 

we insert the identity 

£Z " + (z 2 -l)z' = - 
( 

so that (1.1) becomes 

y' = -z 

Fig. 1.1 shows solutions of Eq. (1.2) with e = 0.03 in the (y, z) -plane. One ob¬ 
serves rapid movements towards the manifold M defined by y = z 3 /3 — z , close 
to which the solution becomes smooth. In order to approximate the solution for 
very small e , we set e = 0 in (1.2) and obtain the so-called reduced system 

y' = -z =f{y,z) 

0 = =g{y,z). 


1 )z' + z = 0, 

d 


( 1 . 1 ) 




:=y 


=-f(y, z) 
*) =-g{y,z)- 


( 1 . 2 ) 


( 1 . 2 ’) 
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Fig. 1.1. Solutions of SPP (1.2) Fig. 1.2. Reduced problem (1.2’) 


While (1.2) has no analytic solution, (1.2’) can easily be solved to give 

z 1 

y f = —z == (z 2 — l)z f or \n\z\——=x + C. (1.3) 

Equation (1.2’) is called a differential algebraic equation (DAE), since it com¬ 
bines a differential equation (first line) with an algebraic equation (second line). 
Such a problem only makes sense if the initial values are consistent , i.e., lie on the 
manifold M . The points of M with coordinates y = ±2/3, z = =p 1 are of special 
interest (Fig. 1.2): at these points the partial derivative g z = dg/dz vanishes and 
the defining manifold is no longer “transversal” to the direction of the fast move¬ 
ment. Here the solutions of (1.2’) cease to exist, while the solutions of the full 
problem (1.2) for e —»■ 0 jump with “infinite” speed to the opposite manifold. For 
— 1<z<1 the manifold M is unstable for the solution of (1.2) (here 9 Z > °)> 
otherwise M is stable (g z < 0). 

We demonstrate the power of the reduced equation by answering the question: 
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what is the period T of the limit cycle solution of van der Pol’s equation for e —> 0 ? 
Fig. 1.2 shows that the asymptotic value of T is just twice the time which z(x) of 
(1.3) needs to advance from z = — 2 to z — — 1, i.e., 

T = 3 — 21n 2. (1.4) 

This is the first term of Dorodnicyn’s asymptotic formula. We also see that z(x) 
reaches its largest values (i.e., crosses the Poincare cut z 1 — 0, see Fig. 1.16.2) 
at 2 = d=2. We thus have the curious result that the limit cycle of van der Pol’s 
equation (1.1) has the same asymptotic initial value z = 2 and z' — 0 for e -» 0 
and for e —> oo (see Eq. (1.16.10)). 


The e -Embedding Method for Problems of Index 1 

We now want to study the behaviour of the numerical solution for e —> 0. This 
will give us insight into many phenomena encountered for very stiff equations and 
also suggest advantageous numerical procedures for stiff and differential-algebraic 
equations. Let an arbitrary singular perturbation problem be given, 

y' = f(y,z) (l .5a) 

ez' = g{y,2'■), (i.5b) 

where y and z are vectors; suppose that / and g are sufficiently often differ¬ 
entiable vector functions of the same dimensions as y and 2 , respectively. The 
corresponding reduced equation is the DAE 

y' = f{y,z) (i-6a) 

0 = g(y,z), (1.6b) 

whose initial values are consistent if 0 = g(y 0 , z 0 ) . A general assumption of the 
present chapter will be that the Jacobian 

g z (y,z) is invertible (1.7) 

in a neighbourhood of the solution of (1.6). Equation (1.6b) then possesses a locally 
unique solution z — G{y) (“Implicit Function Theorem”) which inserted into (1.6a) 
gives 

y' = f(y,G(y)), (1.8) 

the so-called “state space form”, an ordinary differential system. Under the as¬ 
sumption (1.7), Eq. (1.6) is said to be a differential-algebraic equation of index 1. 

An interesting approach for solving (1.6) is to apply some numerical method 
to the SPP (1.5) and to put e = 0 in the resulting formulas. Let us illustrate this 
approach for Runge-Kutta methods. Applied to the system (1.5) we obtain 

s 

Y ni = y n + h Ys % f ( y «7’ Z nj ) 

1 =1 


(1.9a) 



VI. 1 Solving Index 1 Problems 375 

s 

eZ ni = £Z n+ h Yl % 9 Z nj ) ( L9b ) 

3 - 1 
s 

yn+l + (L9c) 

Z=1 

s 

ez n+1 =£z n + h^2b ig {Y ni ,Z ni ). (1.9d) 

Z=1 

We now suppose that the RK matrix (a-) is invertible and obtain from (1.9b) 

s 

h 9(Y„i,Z n i)=eY^ U3 ij( Z nj- z n)> ( L1 °) 

3 = 1 


where the lo 1 - are the elements of the inverse of (a - •). Inserting this into (1.9d) 
makes the definition of z n+1 independent of e. We thus put without more ado 
£ = 0 and obtain 


s 


Y n , : — Vn + ^ Yj a ij f (Y n j’ Z nj) 

(1.11 a) 

j= 1 


Q = 9{Yni,Z ni ) 

(1.11b) 

Vn+1 =y n + h Ys^f Z ni) 

i—1 

(1.11c) 

Z n+1 = ( 1 - Y h i ^j) Z n + Y b * Z ny 

(l.lld) 

' i,j— 1 ' i,j= 1 


s 

i - Y b i uj v= R ( o °) 

(1.1 le) 


i,j= 1 

(see Eq. (IV.3.15)), where R(z) is the stability function of the method. 


State Space Form Method 

The numerical solution (y n+1 , 2 n+1 ) of the above approach will usually not lie on 
the manifold g{y,z) — 0. However, this can easily be repaired by replacing (1.1 Id) 
by the condition 

0 = g(y n+1 ,z n+1 ). (1.12) 

Then, we do not only have Z nj = G(Y nj ) (see (1.1 lb)), but also z n+1 = G(y n+1 ). 
In this case the method (l.lla-c), (1.12) is identical to the solution of the state 
space form (1.8) with the same Runge-Kutta method. This will be called the state 
space form method. The whole situation is summarized in the following diagram: 
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SPP (1.5) f- 
RK 

Sol. (1.9) 


£ i —0 


£->0 


DAE (1.6) 


z=G(y) 


e-embedding method 
state space form method 4- 


-> ODE (1.8) 
RK 

Sol. (1.12) 


Of special importance here are stiffly accurate methods, i.e., methods which satisfy 

a si = b i for * = !,•••,«• (1-13) 

This means that y n+1 = Y ns , z n+1 = Z ns and (1.12) is satisfied anyway. Hence for 
stiffly accurate methods the e -embedding method and the state space form method 
are identical. For this reason, Griepentrog & Marz (1986) denote such methods 
IRK(DAE). 

Both approaches have their own merits. Theoretical results for the e -embed¬ 
ding method yield insight into the method when applied to singular perturbation 
problems. Moreover, this approach can easily be extended to more general situa¬ 
tions, where the algebraic relation is not explicitly separated from the differential 
equation (see below). The state space form method, on the other hand, has the 
advantage that it is not restricted to implicit methods. Applying an explicit Runge- 
Kutta method or a multistep method to Eq. (1.8) is certainly a method of choice for 
semi-explicit index 1 equations. No new theory is necessary in this case. 


A Transistor Amplifier 

... auf eine merkwiirdige Tatsache aufmerksam machen, das ist 
die auBerordentlich grosse Zahl beruhmter Mathematiker, die aus 
Konigsberg stammen ...: Kant 1724, Richelot 1808, Hesse 1811, 
Kirchhoff 1824, Carl Neumann 1832, Clebsch 1833, Hilbert 1862. 

(E Klein, Entw. derMath., p. 159) 

Very often, differential-algebraic problems arising in practice are not at once in the 
semi-explicit form (1.6), but rather in the form Mu' — tp{u ) where M is a constant 
singular matrix. 

As an example we compute the amplifier of Fig. 1.3, where U e (t) is the entry 
voltage, U b — 6 the operating voltage, U { (t) ( i = 1,2,3,4,5) the voltages at the 
nodes 1, 2, 3,4,5, and U 5 (t) the output voltage. The current through a resistor 
satisfies I = U/R (Ohm 1827), the current through a capacitor I = C • dU/dt , 
where R and C are constants and U the voltage. The transistor acts as amplifier 
in that the current from node 4 to node 3 is 99 times larger than that from node 
2 to node 3 and depends on the voltage difference U 3 — U 2 in a nonlinear way. 
Kirchhoff’s law (a Konigsberg discovery) says that the sum of currents entering a 
node vanishes. This law applied to the 5 nodes of Fig. 1.3 leads to the following 
equations: 
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Fig. 1.3. A transistor amplifier 


node 1: ^ ^ - U[) = 0 

K 0 k 0 

node 2: ^ - U 2 (d- + -3-) + C x (^ - tf') - 0.01 f(U 2 -U 3 ) = 0 

_Zl 2 ^XL-^ xt2 ^ 

node 3: f(U 2 - U 3 ) - ^ - C 2 U' 3 = 0 (1.14) 

it 3 

node 4: + C 3 (U' 5 - U[) - 0.99 f(U 2 -U 3 ) = 0 

node 5: - ^ + C 3 (^ -E^) = 0. 

As constants we adopt the values reported (for a similar problem) by Rentrop, 
Roche & Steinebach (1989) 

/(tf) = 10-*(«p(^)-l) 

ii 0 = 1000, R 1 = ... = R 5 = 9000 
C^fc-10 -6 , fc = 1,2,3, 
and the initial signal is chosen as 

U e (t) — 0.4 • sin(2007rt). (1.15) 

Equations (1.14) are of the form Mu' — ip (u) where 
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is obviously a singular matrix of rank 3. The sum of the first two and of the last 
two equations leads directly to two algebraic equations. Introducing e.g., 

Ui—U 2 =yn f / 4 —f / 5 = y 3 , U 1 = z 1 , U A = z 2 , 

transforms equations (1.14) to the form (1.6). Consistent initial values must thus 
satisfy ^i(^) + ^ 2 ( w ) = 0 and + ( / ? s( w ) = 0- If we put U 2 (0) = U z ( 0), we 
have f(U 2 (0) — U 3 (0)) = 0 . Since U e (0) = 0 , we then easily find consistent initial 
values, e.g., as 

^( 0 ) = 0 , U 2 (0) = U 3 (0) = 1 p& r , ^ 4 ( 0 ) = U b , U 5 (0) = 0 . ( 1 . 16 ) 

“r ti 2 


Problems of the Form Mu' - (p(u) 

Numerical methods for problems of the form 

Mu' — ip(u) : (1.17) 

where M is a constant matrix, can be derived as follows: we assume that M 
is regular, apply an ODE method to u' — M^tpiu) and multiply the resulting 
formulas by M . For Runge-Kutta methods we obtain in this way 


s 


M{U ni — u n ) = h 

5-/ a ij^^nj) 

3 -1 

(1.18a) 

1 

1 — 1 

s 

U n+ hPiFnii 

ij= 1 

(1.18b) 


where again (o^-) is the inverse of (a -). The second formula was obtained from 

s 

M(u n+ 1 -u n ) = hY;t>M U ni) ( 1 . 18 c) 

i=l 

in exactly the same way as above (see (1.10)). 

Formulas (1.18) also make sense formally when M is a singular matrix. In this 
case, problem (1.17) is mathematically equivalent to a semi-explicit system (1.6) 
and method (1.18) corresponds to method (1.11). This can be seen as follows: we 
decompose the matrix M (e.g., by Gaussian elimination with total pivoting) as 

M = S(^ °)l\ (1.19) 

where 5 and T are invertible matrices and the dimension of I represents the rank 
of M. Inserting this into (1.17), multiplying by 5 _1 , and using the transformed 
variables 


Tu = 


y 

z 


( 1 . 20 ) 
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gives 



( 1 . 21 ) 


a problem of type (1.6). An initial value u 0 is consistent if <p{ u o) H es l n lh e ran g e 
of the matrix M. 

Similarly, if (1.19) is inserted into (1.18), and the variables 


Y 

TU = ( n i 

1U nj \ Z . 

nj 


Tu„ = 


( 1 . 22 ) 


are introduced, Eq. (1.18b) (for Z n+1 ) and Eq. (1.18c) (for Y^) lead precisely 
to equations (1.11). This means that the diagram 


Problem (1.17) 


Transf. (1.20) 


Meth. 


Problem (1.6) 


(1.18) 


Meth. 


( 1 . 11 ) 


(1.23) 




Transf. (1.22) 


> {y n }A z n\ 


commutes. An important consequence of this commutativity is that all results for 
semi-explicit systems (1.6) and the e -embedding method (1.11) (existence of a 
numerical solution, convergence, asymptotic expansions, ...) also apply to implicit 
problems (1.17) with singular M and method (1.18). 

All codes, such as RADAU5, which have an option for implicit differential 
equations (1.17) can thus be applied directly. This has been done for problem 
(1.14) with initial values (1.16), integration interval 0 < x < 0.2, and Tol = 10 -4 . 
The code computed the solution U 5 (t) displayed in Fig. 1.4 in 556 (accepted) steps. 
The comparison with the entry voltage U e (t) shows that our amplifier is working. 
See also Hairer, Lubich & Roche (1989), p. 108-111 for a more elaborate example. 



Fig. 1.4. Computed solution of amplifier problem (1.14) 
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Convergence of Runge-Kutta Methods 

If the method is stiffly accurate, the numerical solutions (1.11) are equivalent to 
those of the ordinary equation (1.8). Therefore the convergence of the solutions is 
described by Theorems II.3.4 and II.3.6 as 

= Z n~ z ( X n) = °( hP )> C 1 - 24 ) 

where p is the classical order of the method (the second formula follows from a 
Lipschitz condition for G). For general methods, the estimate (1.24) remains valid 
for y n , because (l.lla,b,c) are independent of z n and do not change if ( 1.1 Id) is 
replaced by (1.12). Thus we only have to prove a convergence result for z n . An 
essential ingredient of the following theorem is the stage order q of the method, 
i.e., condition C(q) of Sect.II.7 or IV.5. 

Theorem 1.1. Suppose that the system (1.6) satisfies (1.7) in a neighbourhood of 
the exact solution (y(x), z(x)) and assume the initial values are consistent. Con¬ 
sider a Runge-Kutta method of order p, stage order q and with invertible matrix 
A. Then the numerical solution of (1.11 a-d) has global error 

z n -z(x n ) = 0{h r ) for x n — x 0 = nh < Const, (1.25) 


where 

a) r — p for stiffly accurate methods , 

b) r = min (p, q + 1) if the stability function satisfies — 1 < i2(oo) < 1, 

c) r — min(p— 1, < 7 ) if R(oo) = +1. 

d) If |i?(oo)| > 1, the numerical solution diverges. 

Proof. Part (a) has already been discussed. For the remaining cases we proceed as 
follows: we first observe that Condition C(q) and order p imply 

s 

z(x n + c { h) = z(x n ) + h,y2 a ij z\x n + Cj h) + 0(h q+1 ) (1.26a) 

j= 1 

S 

z (x n +i) = z(x n ) + hYj b iz'{x n + Ci h) + 0(h p+1 ) . (1.26b) 

i= 1 

Since A is invertible we can compute z , (x n + c-h) from (1.26a) and insert it into 
(1.26b). This gives 

z(x n+1 ) = ez(x n ) + b T A-'Z n + 0(h? +1 ) + 0(h q+1 ) (1.27) 

where q = 1 — b T A~ l 11 = R( oo) and Z n = (z(x n + cffl ),..., z{x n + c s h)) T . We 
then denote the global error by A z n — z n — z(x n ), and A Z n = Z n — Z n . Sub¬ 
tracting (1.27) from (1.1 Id) yields 

Az n+1 = + b T A -1 AZ n + 0(h p+1 ) + 0(h q+1 ). 


(1.28) 
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Our next aim is to estimate A Z n . For this we have to consider the y -component 
of the system. Due to (l.lla-c) the values y n ,Y ni are those of the Runge-Kutta 
method applied to (1.8). It thus follows from Theorem II.8.1 that y n — y(x n ) = 
e p (x n )hP + 0(hv+ l ) . Since Eq. (1.26a) also holds with z(x) replaced by y(x) , 
we can subtract this formula from ( 1 . 11 a) and so obtain 

Y ni-y( x n+ C i h )=y n -y( x n) 

s 

+ h E a H (/(*»>, G(Y nj )) - f{y(x n + Cj h), G(y(x n + Cj h))) + 0(h« + '). 

3 = 1 

This implies that 

Y ni -y( x n + G h ) = 0{h v ) with i/ = min(p,g+l). 

Because of (1.11b) we get 

- Z{x n + c t h ) = G(Y nl ) - G(y(x n + c t h )) = 0(h") 
and Eq. (1.28) becomes 

Az n+1 = & Az n + S n +V where K+l = 0(h v ). (1.29) 

Repeated insertion of this formula gives 

n 

Az„ = E)rt> (i- 30 ) 

i=l 

because A z 0 = 0. This proves the statement for —1. For the case g — ~ 1 the 
error A z n is a sum of differences — 5 -. Since 5 n+1 is actually of the form 
J n+1 = d{x n )h v + 0(h v + l ) we have - 5-~0{h v ^ 1 ) and the statement also 
follows in this situation. □ 


The order reduction in the z -component (for non stiffly accurate methods) was 
first studied by Petzold (1986) in a more general context. 

Exercises 

1. Compute the solutions of the boundary value problems 

£ y ff Yy'-\-y = 1 respectively ey"—y'-\-y = l (1.31) 
2 /( 0 ) = 2 /( 1 ) = 0, for e>0. 

Observe that the solutions possess, for £ —» 0, a “boundary layer” on one of the 
two sides of [0,1] and that the limit solutions for £ = 0 satisfy 

y -f y = 1 respectively — y -f y = 1 

with one of the two boundary conditions being lost. 



YI.2 Multistep Methods 


The aim of this section is to study convergence of multistep methods when ap¬ 
plied to singular perturbation problems (Runge-Kutta methods will be treated in 
Sect. VI.3). We are interested in estimates that hold uniformly for e -> 0. The re¬ 
sults of the previous chapters cannot be applied. Since the Lipschitz constant of the 
singular perturbation problem (1.5) is of size 0(e~ 1 ), the estimates of Sect. III.4 
are useless. Also the one-sided Lipschitz constant is in general 0{e ~ x ), so that the 
convergence results of Sect. V .8 can neither be applied. Let us start by considering 
the reduced problem. 


Methods for Index 1 Problems 

A multistep method applied to the system y f = /(y, z ), ez‘ = g(y, z) gives 

k k 

Yl a i y n+i^ h Yl l 3 i^ y n+r’ Z n+i) ( 2 - la ) 

z=0 z=0 

k k 

£ H Z n+i = 9(y n +i> Z n+i )' (2- lb ) 

z=0 z=0 

By putting e = 0 we obtain (e -embedding method) 

k k 

'52 a iyn+i = h '52Pif(yn+i, z n+i) ( 2 - 2a ) 

z=0 z=0 

k 

°=^2Pi9(y n + t , Z n+i) ( 2 - 2b ) 

1=0 

which allows us to apply a multistep method to the differential-algebraic system 
(1.6). This approach was first proposed (for the BDF methods) by Gear (1971). 

Theorem 2.1. Suppose that the system (1.6) satisfies (1.7). Consider a multistep 
method of order p which is stable at the origin and at infinity (0 and oo are in 
the stability region) and suppose that the error of the starting values for 

j = 0,... , k — 1 is 0(h p ). Then the global error of (2.2) satisfies 

y n -y( x n) = 0{h p ), z n -z(x n ) = 0(h p ) 

for x n — x 0 = nh < Const. 
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Proof. Formula (2.2b) is a stable recursion for S n = g(y n , z n ), because oo lies in 
the stability region of the method. This together with the assumption on the starting 
values implies that S n = 0(h p ) for all n > 0. By the Implicit Function Theorem 
g(y n i z n) = $ n can b e solved for z n and yields 

z n = G(y n ) + 0(h>) (2.3) 

with G(y) as in (1.8). Inserting (2.3) into (2.2a) gives the multistep formula for 
the differential equation (1.8) with an 0(h p+1 ) perturbation. The statement then 
follows from the convergence proof of Sect. III.4. □ 


For the implicit index 1 problem (1.17) the multistep method becomes 

k k 

M '}2°‘i u n+i = h '}2Pi ( P( u n+i) ( 2 - 4 ) 

i=0 i=0 

and convergence without any order reduction for methods satisfying the hypotheses 
of Theorem 2.1 follows from the transformation (1.20) and the diagram (1.23). 

The state space from approach is also possible for multistep methods. We just 
have to replace (2.2b) by 

9(y n +k, Z n+k) =°- (2-2c) 

Method (2.2a,c) is equivalent to the solution of (1.8) by the above multistep method. 
Hence, we have convergence as for nonstiff ordinary differential equations. The 
assumption “oo G S'” is no longer necessary and even explicit methods can be 
applied. 


Convergence for Singular Perturbation Problems 

The error propagation has been studied by Soderlind & Dahlquist (1981) using (Te¬ 
stability estimates. Convergence results were first obtained by Lotstedt (1985) for 
BDF methods. The following convergence result by Lubich (1991), based on the 
smoothness of the exact solution and thus uniform in e as long as we stay away 
from transient phases, gives optimal error bounds for arbitrary multistep methods. 
The Jacobian of the system (1.5) is of the form 

fy f Z 

and its dominant eigenvalues are seen to be close to £ _1 A where A represents the 
eigenvalues of g z . For reasons of stability we assume throughout this subsection 
that the eigenvalues of g z have negative real part. More precisely, we assume that 

the eigenvalues A of g z (y,z) lie in | arg A — tt\ < a (2.5) 

for (y,z) in a neighbourhood of the considered solution. We then have the follow¬ 
ing result for method (2.1a,b): 
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Theorem 2.2 (Lubich 1991). Suppose that the multistep method is of order p, 
A(a) -stable and strictly stable at infinity. If the problem (1.5) satisfies (2.5), then 
the error is bounded for h>e and nh < x — x 0 by 

\\yn-y{ X n)\\ + \\ Z n- Z ( X n)\\ 

<c( m&x \\y j -y(x j )\\+ hP f \\y {p+1) (x)\\ dx 

\o<j<k J Xo 

+ {h + g n ) ||Zj - z(xj )|| + eh p ^max^ |k (p+ 1 ) (x)||) 

with 0 <Q< 1 - This estimate holds for h < h 0 (h 0 sufficiently small, but inde¬ 
pendent of e ), and provided that the starting values are in a sufficiently small, h - 
and e -independent neighbourhood of the exact solution. The constants C and q 
are independent of e and h. 

Proof The proof is divided into several parts: in part (a) we shall derive recursive 
estimates for the global error, these will be solved in part (b); part (c) proves an 
inequality which is needed in (a). 

a) First we insert the exact solution of (1.5) into the method (2.1) and so obtain 

k k 

Y), a i y{ X n+i) f(y( X n+i), Z ( x n+i)) + d n+k ( 2 - 6a ) 

z=0 z=0 

k h ^ 

a i Z ( X n+z) =~ YlPi y(y( X n+z), Z { X n+i)) + e n+ki ( 2 ' 6b ) 

z=0 z=0 

where the perturbations d nJrk , e n+A , can be estimated (for n > 0) as 

K + k\\<C,h p r k h ^(x) ¥ x (2.7a) 

J x n 

\\e n+k \\ < C' 2 h p+1 max ||^ (p+ 1 ) (a:)||. (2.7b) 

x n <x<x n+k 

We then denote the global errors by A y n —y n — y(x n ), A z n = z n — z(x n ) and 
introduce the differences 

k 

A f n +k = J2 Pi (/(^n+i. z n+i) - f(y( X n+i). z ( x n+i))) . n > 

z=0 

A fj = 0 for j < k. Subtraction of (2.6a) from (2.1a) yields for n > 0 

k 

52 a ^ Ay n+i = hA fn+k- d n+k- ( 2 - 8 ) 

z=0 

Guided by previous experience (see (V.7.41)), we define d 0 ,..., d k _ x so that (2.8) 
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also holds for negative n . Solving for A y n gives 


A Vn = h Yl r n-j(°) A fj ~ X] r n-j(°) d j 

j =o 3=0 

where 7 ^( 0 ) is defined in (V.7.44). These numbers are the coefficients of r(£, 0) = 
C“ fc /^(C _1 )- By zero-stability of the method, the sequence {r-( 0 )} is bounded, 
so that a Lipschitz condition for /(y, z) implies the estimate 

n n 

||Ay n ||<^(M||As /j ||+iV||Az J ||) + C 3 ^||rf J ||. (2.9) 

3=0 3=0 

A more refined estimate is necessary for the z -component. We take the difference 
of ( 2 . 1 b) and ( 2 . 6 b) and then subtract from both sides the quantity 

h k 

~Y^Pi JAz n+i where J = 9 z {y 0 , z o)- (2.10) 

1=0 

This yields 

k 

Y^i 1 ~Pi^ J ) Az n+i = J A 9n+k~ e n+k ( 2 -ll) 

i =0 

where 

k 

A 9n+k = Y< 3 i (diVn+z^n+i) ~ 9 (y{x n+ i ), *(*„+*)) - J A Z„ +1 ) , (2.12) 

1=0 

and A g- = 0 for j <k. We again define e 0 ,..., e k _ x such that (2.11) holds for 
negative n, and we then solve (2.11) for A z n . This gives 




3=0 


3=0 


where the matrices r ■( j J) are defined by (see Formula (V.7.50)) 


j> o 


r (C _1 ) 


with S(() given in (V.7.45). In part (c) below we shall prove that 




<CV 


with 0 < k < 1. 


Inserted into (2.13) we thus get 


(2.13) 


(2.14) 


(2.15) 


l|A^II<E Kn_, f L ll A yill + ^ Az ill) +C 4rE Kn_i ll e ill- (2 ‘ 16) 

3 =0 3 =0 

It is important to remark that the Lipschitz constant t can be made arbitrarily small 
by shrinking the considered interval. 
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b) In order to solve the inequalities (2.9) and (2.16) we define sequences {u n } 
and {v n } by 


i n = h'iT,(Mu j +Nv j ) + C 3 Y l HI 


3 =0 


3=0 

£ 


(2.17) 




3=0 j =0 


An induction argument shows that 

ll A 2/J< U n> || A ^ n || < V n 

provided £ < 1 and h<h 0 . We then rewrite (2.17) as 

u n = u n _ 1 +hMu n + hNv n + C 3 \\d n \\, =0, 

v n = Kv n ^ 1 + Lu n +iv n + C 4 ^ ||ej, =0. 

Solving for u n , t; n we get (with q = k/(1—£)) 


U "'=A(h){ U "-') + ld ~ 


A(h) 


1 + 0(h) O(h) 
0(1) e + o{h) 


where 


Kl < Q(Kll + e||ej|), |ej < C 6 (||d n || + £ ||ej 


Inserting (2.18) repeatedly we obtain 

;-)=•£ w' (P. 


3=0 


(2.18) 

(2.19) 

( 2 . 20 ) 


If l is small enough so that g = k/(1 — £) <1 and if h <h 0 , then the eigenvalues 
of A(h) are distinct and A(h) can be diagonalized as 

A(h) = T-\k )( 1 + ^ T W =( 0( 1 1) 

Inserted into (2.20) this yields 

n n 

u n + v n < Const. (Y^ d j + Yjh + Q n 

3 = 1 3 = 1 

Since d 0 ,..., d k _ x are linear combinations of the values Ay 0 ,..., A y k-1 , and 
e 0 ,..., e k _ 1 are linear combinations of the A z- and jAz- , the statement of the 
theorem follows from (2.19) and (2.7). Because of our assumption on £ (that 
q = k/(1 — £) < 1) we have proved the theorem for sufficiently small (but e- 
independent) intervals. Compact intervals [x$,x] can be covered by repeated ap¬ 
plication of the above estimates. 
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c) It still remains to prove (2.15). More generally, we shall show that 




< CkP 


with 0 < k < 1 


( 2 . 21 ) 


holds uniformly in a compact neighbourhood of the solution. This is necessary, if 
the above estimates are applied to several subintervals. In order to prove (2.21) we 
remember that r^-(| J) is defined by (2.14). If we are able to show that 


(^HO 1 -9 z (y^ z )) 


c 


<C for |C|<1 /k 


<t(C k ) 

then the estimate (2.21) follows immediately from Cauchy’s integral formula 


( 2 . 22 ) 



By definition of the stability region S of a multistep method, the value £(£) lies 
outside of S whenever |£| < 1. Recall that the method is A(a) -stable and strictly 
stable at infinity, and the differential equation satisfies (2.5). Therefore the set of 
eigenvalues of g z (y, z) (with (y, z) varying in a compact neighbourhood of the 
solution) is well separated from {jS(Q ; 7 < 1, |£| < 1}. It is even separated 
from {7^(0 ; 7 < 1, ICI < 1/4 with some k < 1. Together with Exercise 2 of 
Sect. V.7 this proves (2.22). □ 


Exercises 

1. (Lubich 1991). Prove that for the BDF-schemes the estimate of Theorem 2.2 
(for n > k) is valid with (h + g n ) replaced by e(l 4- g n /h ) in the factor mul¬ 
tiplying the 2 :-component of the errors in the starting values. 

Hint. Give a direct proof for n E {&,..., 2k — 1}; then apply Theorem 2.2 to 
shifted starting values. 
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In the preceding section we have proved convergence of multistep methods for sin¬ 
gular perturbation problems. The same techniques do not yield optimal estimates 
for Runge-Kutta methods. We therefore investigate more thoroughly the structure 
of the solutions of singular perturbation problems. A first systematic study of the 
qualitative aspects of such problems is due to Tikhonov (1952). Asymptotic expan¬ 
sions were then analyzed by Vasil’eva (1963). Classical books on this subject are 
Wasow (1965), O’Malley (1974), and Tikhonov, Vasil’eva & Sveshnikov (1985). 

Expansion of the Smooth Solution 

Tihonov’s theorem is only the first step ... The actual approxi¬ 
mate solution of such problems in series form is still a difficult 
question. It has been analyzed in a series of papers by Vasil’eva 

(W. Wasow 1965) 


We consider the singular perturbation problem 

y = f(y,z) 

zz'=g{y,z), 0<e<i 


(3.1) 


where / and g are sufficiently differentiable. The functions /, g and the initial val¬ 
ues y(0), z(0) may depend smoothly on e . For simplicity of notation we suppress 
this dependence. The corresponding equation for e — 0, 


y' = f(y,z) 
0 = g{y,z), 


(3.2) 


is the reduced problem. In order to guarantee the solvability of (3.2), we assume 
that g z (y, z) is invertible (in a neighbourhood of the solution of (3.2)). 

We are mainly interested in smooth solutions of (3.1), which are of the form 

y( x ) = yo( x ) + £ yi( x ) + e 2 y2( x ) +■ ■ ■ 

z(x) = z 0 (x) +e 2 1 (x) + e 2 2 2 (x) +- 

Inserting (3.3) into (3.1) and collecting equal powers of e yields 
e o. 2/o = /(%> z o)l 

o = g(y 0 > z o)) 


(3.4a) 
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e i. y'i = fy(yo, z o)yi + fz(y 0 , z o) z i\ (34b) 

4 = g y (y 0 ’ z o)yi +9 z (y 0 ,z 0 ) z i J 

£V . y|- =/y(2/o^o)^ + /z(2/o^o)^ + ¥ , ! /(2/o 1 2 o.--->2/ J /-i>^-i) L 34c) 

4-i=9y(yoi z o)yv+gz(yo, z o) z v+' i PAyo’ z o,---,yv-i, z v- 1 ) I 
As expected, we see from (3.4a) that y 0 (x), z 0 (x) is a solution of the reduced 
system. Since g z is invertible, the second equation of (3.4b) can be solved for 
z x . By inserting z 1 into the upper relation of (3.4b) we obtain a linear differential 
equation for y 1 (x). Hence, y 1 (x) and z x {x) are determined. Similarly, we get 
y 2 (x ), z 2 (x) from(3.4c), etc. 

This construction of the coefficients of (3.3) shows that we can choose the 
initial values y - (0) arbitrarily, but that there is no freedom in the choice of 2 ^( 0 ). 
Consequently, not every solution of (3.1) can be written in the form (3.3). 


Expansions with Boundary Layer Terms 

To construct a uniform asymptotic expansion we must combine 
the Maclaurin expansion with another expansion of special form. 
The terms in this expansion are exponential functions that are ap¬ 
preciable inside the boundary layer, but negligibly small outside 
it. (A.B. Vasil’eva 1963) 

Example 3.1. We consider the problem (IV. 1.1), written in the form 

ez ——z -\-cosx. (3.5) 


Its analytic solution 

z(x) = (1 + £ 2 ) _1 (cosx + £sin;r) + Ce~ x ^ £ 

— cos x + £ sin x — e 2 cos x — e 3 sin £ + ...+ Ce~ x ^ £ 

is a superposition of a smooth solution of the form (3.3) and of a rapidly decaying 
function. This additional term (transient phase, boundary layer) compensates the 
missing freedom in the choice of the initial values z- (0). 


Motivated by this example, we seek solutions of the general problem (3.1) 
which are of the form 


y ( x ) = X] e 3 y 4 x ) +£ Y 1 e 3 g 4 x l e ) 

j >o j >o 

z ( x ) ^^z 3 z 4 x ) +J2 e3 ( } ( x / £ )> 

j >0 j >0 


(3.6) 


where yj(x ), z-{x) are determined by (3.4) and the e -independent functions rjj , 
(j are assumed to satisfy 






(3.7) 
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with some k > 0. Inserting (3.6) into (3.1) and using (3.4) we obtain formally 

E = f (E + £ E e S (f) > E eiz i( x ) +E ei 9 ( 7 )) 

j> o v i> o j>o j>o j> o 7 


-/( E^'M’ E £j2 4 x )) (3.8a) 

4>o i>o 7 

E e> 9(f)=ff(E eJ i'j( a: ) +e E e, ^(f)> E^-w+E^ff)) 

i>0 j>o j> o i>o i>0 ' 

-s(E e, M*)>E e sW). (3 - 8b) 

4>0 j>0 ' 

We then replace x by the stretched variable 

£ = */e (3-9) 

and compare like powers of e in (3.8). This gives for e° 

»7o(O = /(yo(0)^o(0) + Co(O)-/(yo(0)^o(0)) (3.10a) 

Co(O = 9(y o (0),z o (0) + c o (O)-9{yo^),z o (0)). (3.iob) 

At this point it is necessary to introduce some stability assumption for (3.1) in order 
to obtain (3.7). We shall require that the logarithmic norm of g z satisfy 

Ksz{y>z ))<-1 (3.ii) 


in an £-independent neighbourhood of the solution of (3.2) (any negative bound 
other than — 1 can be normalized by re-scaling e ). By Theorem 1.10.6 Eqs. (3.10b) 
and (3.11) imply 

IICo(£)ll < IICo(0)||e-€. 

Since /(y, z) satisfies locally a Lipschitz condition, the right-hand side of (3.10a), 
denoted by </?(£), is bounded by ||y?(£)|| < L||C o (0)||e~E Consequently, there is 
only one solution of (3.10a) which satisfies (3.7), namely 

%(£) = / v(s)ds- / (f(s)ds. (3.12) 

Jo Jo 

A comparison of the powers of e 1 in (3.8) yields 

4 (£) = /y (s/o( 0 )i Z o( 0 ) + C 0 (£)) (2/1(0) + &/o( 0 ) + %(£)) 

+ fz (2/0(0)) 2 o(o) + C 0 (£)) ( 2 i(o) + £ 4 ( 0 ) + Ci(£)) 

- /y (2/0(0), z 0 (°)) (y x (0) + £ 4 ( 0 )) 

- fz (2/0(0), «o(°)) {z x ( 0 ) + £ 4 ( 0 )) 

Ci(£) = ^(2/0(0), ^o(o) + C 0 (£)) (2/1(0) + £2/0(0) + %(£)) 

+ 9 Z (2/0(0), ^o(o) + Co(£)) ( 2 i (0) + £ 4 (o) + Ci (£)) 


(3.13a) 
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- 9 y (2/o(0), z 0 (0)) (y x (0) + C2/o(0)) 

- 9 Z (2/o(0), z o (0)) (^(O) + ^o(°)) • (3.13b) 

Eq. (3.13b) is a linear differential equation for Ci (0 • Its defect, for 0 replaced by 
0, is bounded by Ce~Z. Therefore, an application of Theorem 1.10.6 yields 

IKiCOII < e-«(||Ci(0)H + C?0, 

which implies (3.7) for any k < 1. The right-hand side of (3.13a) is then bounded 
by C 1 e~ . As in (3.12) we obtain a unique solution to (3.13a), which satisfies 
(3.7). This procedure can be continued to construct all further ^(<0, 0(0 • At 
each step, the value of k in (3.7) may become smaller. This is no serious difficulty, 
because we are only interested in a finite part of the series (3.6). 

We point out that for the construction of ^ (0, 0(0 we can choose 0(0) 
arbitrarily, but that there is no freedom in the choice of 77 ^( 0 ). 

As a consequence, for an arbitrary initial value for (3.1) with expansion 

2/(0) = 2/o + £ Vi +£ 2 2/2 + • • • 3 4. 

2 ( 0 ) = Zq + ez® + e 2 z 2 + ..., 

the coefficients of the series (3.6) can be constructed as follows: put x = 0 in (3.6) 
to obtain the necessary relations 

2/0 (°) = 2/0 > y J W + v J -iW = y° J , z j (o) + c j (o) = z° j . (3.15) 

This initial value y o (0) = y® determines z o (0) by (3.4a), 0(0) i s then given by 
(3.15), 77 0 (0) by (3.12), y x (0) by (3.15), ^(0) by (3.4b), 0(0) by (3.15), ^(0) 
by (3.13a) and (3.7), y 2 { 0) by (3.15), etc. 

Estimation of the Remainder 


The following result gives a rigorous estimate of the remainder in (3.6), when only 
a truncated series is considered. 

Theorem 3.2. Consider the initial value problem (3.1), (3.14), and suppose that 
(3.11) holds in an e-independent neighbourhood of the solution y 0 (x), z 0 (x) (0 < 
x < x) of the reduced problem (y 0 ( 0) = ). If (y0 z®) lies in this neighbourhood, 

then the problem (3.1), (3.14) has a unique solution for e sufficiently small and for 
0 < x < x, which is of the form 

N N-l 

y( x ) = Yl £:iy j^ + £ s 3r ij( x /s) + o{£ N+1 ) 

3=0 3=0 (3.16) 

N N V ' 

z(x) = £ ei 2j (x) + X] JCjWe) + 0(e N+1 ). 

j =0 j =0 

The coefficients yj(x), z-(x), r/00, 0(0 are given by (3.4), (3.10), (3.13), and 
satisfy (3.7). 
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Proof. We denote the truncated series by 

N N 

y( x ) = 53 £l yji x )+ e 53 £j, ij( x / £ ) 

j= o j=o 

N N 

? ( x )=Yl e3z j^ + J2 £3< ^j( x / £ y 

j=0 j= o 

By our construction of yj(x), Zj(x), rjj{Q, <A(£) we have 

y'(x) = f{y{x),z{x))+0(e N+1 ) . 

ez'(x) = g(y(x),z(x)) +0(e N+1 ). 

Subtracting (3.1) from (3.18) and exploiting Lipschitz conditions for / and g we 
obtain 

D + \\y{x) - y{x)\\ < L^x) - y(x)\\ + L 2 \\z(x) - z(x)\\ + C l£ N+1 
eD + \\z(x) - z(a:)|| < L z \\y(x) - y(a;)|| - ||%) - z(z)|| + C 2 e N+1 . 

Here, denotes the Dini derivate introduced in Section 1.10. We have used 
D+\\w(x)\\ < ||iu'(a;)|| (see Eq. (1.10.4)) and, for the second inequality of (3.19), 
Formula (1.10.17) together with (3.11). 

In order to solve inequality (3.19) we replace < by = and so obtain 

u — L x u + L 2 v + u 0 — ||y(0) -2/(0)|| = O (e N+1 ) 


ev — LoU — v + C 2 e r 




This system is quasimonotone, it thus follows from Exercise 7 (Sect. 1.10) that 


\y( x ) - 2 / 0*011 <^ 0 ), 


| z(x) — z{x )|| < v{x) 


Transforming (3.20) to diagonal form one easily finds its analytic solution and ver¬ 
ifies that u(x) = 0(e N ~^ 1 ), v(x) = G(e N + 1 ) on compact intervals. Inserted into 
(3.21) this proves the statement. □ 


Expansion of the Runge-Kutta Solution 

After having understood the structure of the analytic solution of (3.1), we turn our 
attention to its numerical counterpart. We consider the Runge-Kutta method 


+/j 53^ 


where 


f(Y ni ,Z n 

9{Yni> Z n 



VI.3 Epsilon Expansions for Exact and RK Solutions 393 


and the internal stages are given by 



For arbitrary initial values, the solution possesses a transient phase (as described 
by Theorem 3.2), and the numerical method has anyway to take small step sizes 
of magnitude 0(e). We shall therefore focus on the situation where the transient 
phase is over and the method has reached the smooth solution within the given 
tolerance. We thus suppose that the initial values lie on the smooth solution (i.e., 
that an expansion of the form (3.3) holds) and that the step size h is large compared 
to e. Our first goal is an e -expansion of the numerical solution. To this end, 
we formally expand all occuring quantities into powers of e with e -independent 
coefficients (see Hairer, Lubich & Roche 1988) 

Vn =y°n+ £ y 1 n + £2 y 2 n + --- (3.25a) 

= Y° 2 + eyy + e 2 Yl +... (3.25b) 

Ki = k° nt + ek) ll +E 2 kl l Y... (3.25c) 


and similarly for z n 
(3.24) we have 


Z ni , £ ni • Because of the linearity of the relations (3.22) and 



(3.26) 



(3.27) 


Inserting (3.25b, c) into (3.23) and comparing equal powers of e we obtain 


o Ki = f(Y2i>Z°ni)\ 

0 = g(Y° t ,Z° m ) / 

! Ki = fy{Y2i, Z° ni )Y^ t + f z (Y° t , Z° ni )ZY 
t°m=9y(Y2 l ,ZY)Yi l+9z (Y° l ,ZY)ZY 


(3.28a) 

(3.28b) 


Ki - f y (Y2i, z°m)Y: t + fM, zi t )Z" ni + Vv (Y2i, z ° ni ,..., y :- 1 , ) 

C7 1 = gM, Z^YY + g i (Y° l , Z° { )ZY + i’AYY z° i ,...,Y:r\z^ 1 ) 


(3.28c) 

Since (3.23) has the same form as the differential equation (3.1), it is obvious that 
the formulas of (3.28) are exactly the same as those of (3.4). An interesting inter¬ 
pretation of this fact is the following: the coefficients y° , z°, y\, z l n ,... repre¬ 
sent the numerical solution of the Runge-Kutta method applied to the differential- 
algebraic system (3.4) (e -embedding method of Sect. VI. 1). This can be expressed 



394 VI. Singular Perturbation Problems and Index 1 Problems 


by the commutativity of the following diagram: 

(3.3) 

Problem (3.1) - y 


DAE (3.4) 


RK 


method 


RK 


method 


{y n i z n} 


(3.25) 




Subtracting (3.25a) from (3.3) we get formally 

Vn - y( X n ) = £V ~ y v { X n)) 

i />0 

" (3.29) 

Z n- Z ( X n) = l^ £l, ( Z n- Z A X n))- 

v>0 

In order to study this error we first investigate the differences y” — y u {x n ), z v n — 
z v (x n ) (next subsection). A rigorous estimate of the remainder in (3.29) will then 
follow. The presentation follows that of Hairer, Lubich & Roche (1988). 


Convergence of RK-Methods for Differential-Algebraic Systems 

The first differences — y 0 (x n ), z ° — z 0 (x n ) in the expansions of (3.29) are just 
the global errors of the Runge-Kutta method applied to the reduced system (3.4a). 
By assumption (3.11) this system is of index 1. Therefore, the following result is 
an immediate consequence of Theorem 1.1. 

Theorem 3.3. Consider a Runge-Kutta method of (classical) order p, with invert¬ 
ible coefficient matrix (a-). Suppose that Problem (3.4a) satisfies (3.11) and that 
the initial values are consistent. 

a) If the method is stiffly accurate (i.e., a si = &■ for i = 1,..., s) then the global 
error satisfies 

y°„-y 0 ( x n) = O(h P h Z°n-Z 0 (x n )=O(hV). (3.30) 

b) If the stability function satisfies |iZ(oo)| < 1, and the stage order is q (q < p), 
then 

y°n-yo( X n) = °( hP )i Z n- Z o( X n)=°( hg+1 )- ( 3 - 31 ) 

In both cases the estimates hold uniformly for nh < Const. □ 


Estimating the second differences y\ — y 1 (x n ), z\ — z 1 (x n ) is not as simple, 
because the enlarged system (3.4a,b) with differential variables y 0 ,z 0 , y 1 and alge¬ 
braic variable z x , is no longer of index 1. It is actually of index 2, as will become 
clear in Sect. VII. 1 below (Exercise 5). In principle it is possible to consult the re¬ 
sults of Sect. VII.4 (Theorems VII.4.5 and VII.4.6). For the special system (3.4a,b), 
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however, a simpler proof is possible. It also extends more easily to the higher-index 
problems (3.4a-c). 

Theorem 3.4 (Hairer, Lubich & Roche 1988). Consider a Runge-Kutta method of 
order p, stage order q (q <p), such that (a-) is invertible and the stability func¬ 
tion satisfies |i?(oo)| < 1. If (3.11) holds and if the initial values of the differential- 
algebraic system (3.4a-c) are consistent, then the global error of method (3.26)— 
(3.28) satisfies for 1 < v < q + 1 

y v n -1 /„(*„) = W h2 "'), < - Z v (x n ) = o[h^- v ). 


Proof We denote the differences to the exact solution values by 


A yn = yn-yJ X n)> 

&Yr = Yr-y v {x n + c r h), 

Mn^Ki-y'^n + Cih), 


Az n = z n~ z vi x n), 

AZr = zr-z v (x n + Ci h), 
Mni = tni-U*n+Cih)- 


(3.32) 


Since the quadrature formula with nodes c- and weights b- is of order p, we have 
from (3.26) 



+ 0(h p ^ 1 ). 


Similarly, the definition of the stage order implies 


AY" 

m 

A Z v - 


A Vn 

Azf, 


+ h Y' 

i =i 


Ak\ 
l > \ At 


nj 


+ 0(1 


,?+!> 


It follows from Theorem 3.3 (see also the proof of Theorem 1.1) that 
A y° n = O(^), AV n ° z = G(h * +1 ), A k° ni = 0(M +1 ) 

A*° = G(h^\ AZ° ni = G(lY+ l \ AlY = G(IY). 


(3.33) 


(3.34) 


(3.35) 


a) We first consider the case v — 1. Replacing in (3.28b) Y"T, ZY by y 0 (x n + 
c- /i) + AyT, z 0 (x n + cfi) + A ZY and subtracting Equation (3.4b) at the position 
x = x n + c { h , we obtain with the help of (3.35) 

A kY = f y (x n + Ci h)AYy + f z (x n + c t h)AZ l ni 

+ 0{h^ + h« +1 1| A^.ll + h“ +1 1| AZA||) 

. , {3.3b) 

= 9y( x n + c l h)AYy+g z {x n + ctfAZY 

+ 0(h q+1 + h q+1 1| AF^H + h q+1 ||AZ^||). 

Here we have used the abbreviations f y (x) = f y (y 0 (x),z 0 (x)), etc. Computing 
A ZY from the second relation of (3.36) and inserting it into the first one yields 

AfcL-(W)K + ^)A4 

= (fy - f z 9pg y ){ x n + c^Ayy + 0{h<^ + h ? +1 IIAy„M|). 
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Using (3.34) we can eliminate AY^- and obtain (with (3.35)) 

- {f z g: l ){x n + Ci h)A£° nl = O(IIAyill) + 0(h q+1 ). (3.37) 

Since is of size 0(h q ), we only have A k l ni = 0(||Ayi ||) + 0(h q ), and 

a direct estimation of Ay* in (3.33) would lead to Ay* = 0{h q ), which is not 
optimal. We therefore introduce the new variable 

Aul n = A y 1 „-(fz97 1 )( X n) Az n • (3-38) 

From (3.33) we get 

s 

Au \i+\ = + h Y b * ( Ak ni ~ (/zSrbfaJA^,.) (3.39) 

i=l 

- + h )~ (fz97 1 )( X n)) Az °n +1 + 0{h P+1 ). 

The estimates (3.35), (3.37) and the fact that Ay* = Au^ + 0(h q+l ) imply that 

IIA< +1 || < (1 + Ch)\\ A<|| + 0(/Y+ 2 ). (3.40) 

Standard techniques now show that Au^ = 0(h 9+1 ) for nh < Const (observe that 
the initial values are assumed to be consistent, i.e., AuJ = 0), so that by (3.38) and 
(3.35) also A y l n = G>(h 9+1 ). This implies A k l ni = G{h q ) by (3.37) and AY^- - 
0(h qJtl ) by (3.34). The second relation of (3.36) then proves that A Z x ni — 0(h q ). 
In order to estimate A z *, we compute A£C from (3.34) and insert it into (3.33). 
Using A Z x ni — 0{h q ) this gives 

A4 +1 - (1 - b T A~ 1 fl)A4 + 0{h q ), (3.41) 

and it follows from |1 — b T A~ l 11| = |i?(co)| < 1 that A z l n — 0(h q ) . We have thus 

proved the case v — 1. 

b) The proof for general v is by induction. We shall show that 

A y v n = 0(h q+2 ~ 1 '), A Y: i = 0(h q+2 ~ v ) 

A z v n = 0{h q+1 - v ), A Z v nt = 0{h q+l ~ v ) ( ' } 

holds for v — 1,..., q + 1. The main difference to the case v — 1 consists in the 

additional inhomogeneities ip u and in (3.4c). Using their Lipschitz continuity 
one obtains an additional term of size 0(h qJt2 ~ v ) in (3.36). Otherwise the proof 
is identical to that for v — 1. □ 


We next study the existence and local uniqueness of the solution of the Runge- 
Kutta method (3.22)-(3.24). Further, we investigate the influence of perturbations 
in (3.24) to the numerical solution. This will be important for the estimation of the 
remainder in the expansion (3.29). 



VI.3 Epsilon Expansions for Exact and RK Solutions 397 


Existence and Uniqueness of the Runge-Kutta Solution 


For h small compared to e , the existence of a unique numerical solution of (3.23), 
(3.24) follows from standard fixed point iteration (e.g., Theorem II.7.2). For the 
(more interesting) case where the step size h is large compared to e, we sup¬ 
pose that (y n , z n ) are known, denote it by (?],£), and prove the existence of 
(Vn+l^n+l) as follows: 

Theorem 3.5 (Hairer, Lubich & Roche 1988). Assume that g(rpQ = O(h), 
< — 1 and that the eigenvalues of the Runge-Kutta matrix (a-) have 
positive real part. Then, the nonlinear system 



possesses a locally unique solution for h <h 0 , where h 0 is sufficiently small but 
independent of e . This solution satisfies 

Y i -r } = 0(h), Z l -Q = 0(h). (3.44) 


Proof We apply Newton’s method to the nonlinear system (3.43), whose second 
equation is divided by h . The existence and uniqueness statement can then be 
deduced from the theorem of Newton-Kantorovich (Kantorovich & Akilov 1959, 
Ortega & Rheinboldt 1970) as follows: for the starting values — rj, Z = £ 
the Jacobian of the system is of the form 


fl + 0{h) 0{h) 

V 0(1) ( e/h)I-A®g x ( V ,{) 


(3.45) 


Since u(qJri,C))<— 1 it follows from the matrix-valued theorem of von Neumann 
(Theorem V.7.8) that 

||(kJ-A® 5f 2 (?7,C)) _1 || < max \\(kI - ^A)~ l ||. (3.46) 

The right-hand side of (3.46) is bounded by a constant independent of k > 0, be¬ 
cause the eigenvalues of A are assumed to have positive real part. Consequently, 
also the inverse of (3.45) is uniformly bounded for e > 0 and h < h 0 . This together 
with g(rj, () = 0(h) implies that the first increment (of Newton’s method) is of size 
0(h). Hence, for sufficiently small h, the Newton-Kantorovich assumptions are 
fulfilled. □ 
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Influence of Perturbations 


For the perturbed Runge-Kutta method 


( Z-V 

W z t -0 



S(Y.,Z,)I U 


we have the following result. 


(3.47) 


Theorem 3.6 (Hairer, Lubich & Roche 1988). Let Y i ,Z i be given by (3.43) and 
consider perturbed values V-, Z i satisfying (3.47). In addition to the assumptions 
of Theorem 3.5 suppose that rj—rf — 0(h), ( — ( — O(h), 8 i —0(1), and # • = 
G(h). Then we have for h < h 0 

Ift ~Yi\\< C{\\fi-r,\\ + e||C - Cll) + hC(\\5\\ + ||0||) 

ii ^ -Zi\\< cm-v\\ + |iic - cii)+ c(h\\s\\ + pii). (3 ' 48) 

Here 8 = (J 1? ... , ^ S ) T and 0 = (0 1? ... , 6 S ) T . 


Proof. The essential idea is to consider the homotopy 


Yi-V 

{Zi - C) 



/ fj-V + hSi 
\e((-C) + h6 l 


(3.49) 


which relates the system (3.43) for r = 0 to the perturbed system (3.47) for r = 1. 
The solutions Y i and Z- of (3.49) are functions of r. If we differentiate (3.49) 
with respect to r and divide its second formula by h, we obtain the differential 
equation 


fl+0(h) O(h) \fY\_f l-irj-^ + hS \ 
\ ( 9 ( 1 ) M(e/h,Y,Z))\Z) y( £ /h)2.(C-Q + e) 

where 11 = (1,..., 1) T , Y = (Y 1 ,... ,Y S ) T , Z = (Z 1 ,..., Z 3 ) T and 

/ 9 z {Y 1 ,Z l ) 0 \ 

M(k,Y,Z) = kI-A®I-\ 

V 0 g z {Ys,Z,)/ 

Whenever — r/|| < d and \\Z t — £|| < d for all i, we have 
Af( k, Y, Z) = nl - A ® 9z ( v , C) + 0{d) 


(3.50) 


(3.51) 


(3.52) 


and it follows from (3.46) that M -1 (k, Y, Z) is uniformly bounded for k > 0, if 
d is sufficiently small. Hence, the inverse of the matrix in (3.50) satisfies 


I + Q(h ) 0{h) 

0(1) M(e/h, Y, Z) 


-l 


I+Q(h) 0{h) 

0(1) 0(1) 
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and the statement (3.48) follows from the fact that 

~Z= [ Z(r)dr. □ 

Jo 

Remark 3.7. If the Runge-Kutta matrix A is only assumed to be invertible, the 
results of Theorems 3.5 and 3.6 still hold for e < Kh, where K is any constant 
smaller than the modulus of the smallest eigenvalue of A (i.e., K < | A min |). In this 
situation, the right-hand side of (3.48) is also bounded, and the same conclusions 
hold. 


Y-Y= / Y(r)di 


Estimation of the Remainder in the Numerical Solution 


We are now in the position to estimate the remainder in (3.29). The result is the 
following. 

Theorem 3.8 (Hairer, Lubich & Roche 1988). Consider the stiff problem (3.1), 
(3.11) with initial values y( 0), z( 0) admitting a smooth solution. Apply the Runge- 
Kutta method (3.22)-(3.24) of classical order p and stage order q (1 < q < p). 
Assume that the method is A -stable, that the stability function satisfies |i?(oo) | < 1, 
and that the eigenvalues of the coefficient matrix A have positive real part. Then 
for any fixed constant c > 0 the global error satisfies for e < ch and v < q + 1 

y n - y{x n ) = A y° n + eAyi + ... + ^A y" + O(e^) 

Z n — Z ( X n ) — ^ Z n + £/ ^ Z n + ■ • * + ^ ^ Z n + 0(e V+1 /h). 

Here Ay® — y® — y 0 (x n ), Az® = z®— z 0 (x n ),... (see Formula (3.32)) are the 
global errors of the method applied to the system (3.4). The estimates (3.53) hold 
uniformly for h <h 0 and nh < Const. 

Proof. By Theorem 3.4 it suffices to prove the result for v = q + 1. We denote the 
truncated series of (3.25) by 

Vn - Vn + £ Vn + • • • + £U Un 

Y ni =YZ i +eY? ii + ... + e v YZ i (3.54) 

and similarly z n , Z ni , l nx . Further we denote 

Z^y n y n Vn") ni Z n i Y n Ak n ^ k n i ^nil • * * (3.55) 

Using (3.3) and Theorem 3.4 the statement (3.53) is then equivalent to 
A y n = 0{e v+l ), Az n = 0(e v+1 /h). 


(3.56) 
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a) We first estimate the differences AY n •, A Z ni of the internal stages. For this 
we investigate the defect when (3.54) is inserted into (3.23). By our construction 
(3.28) it follows from (3.42) and v — q + 1 that 


Ki = HY ni Xi) + 0{e‘' +1 ) 

= 9(Y nl , Zm) + e v+1 r ni + 0 ( £ " +1 ). 


From (3.42) and (3.27) we know that l v ni — 0(h~ 1 ). Together with (3.27) this 
implies 


( Ki-Vn 

\£{Z„i-Zn) 



, (0(he v+l ) 

9(Y n] X 3 )) 


(3.58) 


which is of the form (3.47). Application of Theorem 3.6 yields 
IIArjl < C{\\ Aj/J| + e|| AzJ) + 0( £ " +1 ) 

||AZ m || < C(|| AyJ + l\\Az n \\) + 0{e v+l /h) 


provided that A y n and A z n are of size 0(h ). This will be justified in part (c). 
b) Our next aim is to prove the recursion 


/||ZAy„ +1 ||\< /l + 0(^) 0(e) 

(||Az n+1 ||j-( 0(1) a + 0(e) 


HAyJlW 0(e" +1 ) 

\\Az n \\J + {o(e^/h) 


(3.60) 


where we assume again that A y n and A z n are of size 0(h). The value a < 1 
will be given in Formula (3.63) below. The upper relation of (3.60) follows from 


A y n+1 - A y n + h'£b l (/(P m , Z ni ) - f{Y ni , Z ni )) + 0(he v+l ) 
2—1 


by the use of (3.59) and a Lipschitz condition for /. 

For the verification of the second relation in (3.60) we subtract (3.57) from 
(3.23), and use (3.59) and (3.42) to obtain 

eA* ni = g z {x n ) AZ ni + 0(\\AY ni \\ + h\\AZ ni \\) + 0(e v+1 /h). (3.61) 

Here we use the notation g z (x) — g z (y 0 {x), z 0 (x )). Inserting A Z ni = A z n + 
h J2 a ij^nj i nt0 this relation and using (3.59) again we obtain 


eM ni - h Yl a i 3 9 ^ x n) Ai nj =9 z ( x n ) Az n + 0{\\Ay n \\ + e|| Az J) + 0(e u+l /h). 
j=l 

We now solve for hA£ ni and insert it into Az n+1 = A z n + h^2 ^iAi ni . Since the 
matrix (e/h)I — A® g z (x n ) has a bounded inverse by (3.46), this gives 

A* n+1 = g z {x n j) Az n + 0(\\Ay n \\ + e\\Az n \\) + 0(e" +1 / h ), (3.62) 
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where R(fi) is the stability function of the method. Because of (3.11) we can apply 
von Neumann’s theorem (Corollary IV. 11.4) to estimate 

R (-^9z( x n)) <sup{|-R(m)| ; R-e^i < -h/e} < a < 1. (3.63) 

The bound a is strictly smaller than 1, because | R( oo) | < 1 and — h/e<— 1 / c < 0. 
The triangle inequality applied to (3.62) completes the proof of Formula (3.60). 

c) Applying Lemma 3.9 below to the difference inequality (3.60) gives 

A y n = 0(e v+1 /h), Az n = 0(e v+1 /h) (3.64) 

for nh < Const . We are now in a position to justify the assumption A y n = O(h) 
and Az n = 0(h) of the beginning of the proof. Indeed, this follows by induction 
on n (A y 0 = 0(e iy+1 ), A z 0 = 0(e u ~^ 1 )) and from (3.64), because v — q + 1 > 2. 

d) Formula (3.64) proves the desired result (3.56) for the 2 -component. How¬ 
ever, the estimate (3.64) is not yet optimal for the y -component. The proof for the 
correct estimate is similar to that of Theorem 3.4. We have to treat more carefully 
the expression which gives rise to the 0(e u ~^ 1 /h) term in (3.61). Using (3.59) and 
(3.64) the same calculations which gave (3.61), now yield 

a k ni = f y (x n )AY ni + f z {x n )AZ ni + 0(6 y+1 ) (3.65a) 

zM ni = g y (x n )AY ni +g z {x n )AZ ni +e'' +1 t ni + 0{e v+1 ). (3.65b) 

We compute A Z ni from (3.65b) and insert it into (3.65a). This gives 

Ak nt - (f z gJ 1 ){x n )(eA£ ni -e’' + Xi) 

= ( f y - fx97 1 9 y ){x n )AY ni + 0(e v+1 ). 

Guided by this formula we put 

Au n = Ay n - (f z g~ 1 )(x n )(eAz n — e I,+1 z^). (3.67) 

Since 

8 

Au n+1 = A u n + hY,b t (A k nt - {f z g^){x n )[eAi ni - 
1=1 

- ((fzdl^iXn + h )~ (fzdJ^iXn)) ( eA2 n+l ~ £ 1 ' +1 <+l) 

it follows from (3.66), (3.64), and (3.42) that 

l|Au n+1 || < (1 + ch)\\Au a \\ + 0(he’'+ 1 ). (3.68) 

As in the proof of Theorem 3.4 we deduce A u n = 0(e v+1 ) and A y n = 0(e u ~^ 1 ). 

□ 


In the above proof we used the following result. 
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Lemma 3.9. Let {w n }, {v n } be two sequences of non-negative numbers satisfying 
( componentwise) 


( U n +1 
\ V n +1 


(l + 0(h) 0(e) 

v 0(1) a + 0(e) 



(3.69) 


with 0 < a < 1 and M > 0. Then the following estimates hold for e <ch, 
and nh < Const 


u n < C(u 0 -f + M) 

v n < C(u 0 + (e + ® n )v 0 + M). 


h < h 0 
(3.70) 


Proof We transform the matrix in (3.69) to diagonal form and so obtain 



where X 1 = 1 4- 0(h) , A 2 = a + 0(e) are the eigenvalues and the transformation 
matrix T (composed of eigenvectors) satisfies 


T = 


1 o( £ )\ 

0 ( 1 ) 1 ) ■ 


The statement now follows from the fact that (a + 0(e)) n = 0(a n ) + O(e) for 
e < ch and nh < Const. □ 


By combining Theorems 3.3, 3.4 and 3.8 we get the following result. 

Corollary 3.10 (Hairer, Lubich & Roche 1988). Under the assumptions of Theo¬ 
rem 3.8 the global error of a Runge-Kutta method satisfies 

y n - y(x n ) = 0(b?) + 0(eh^ ), *„ - z(x n ) = 0(h « +1 ). (3.71) 

If in addition a si — for all i, we have 

z n -z(x a ) = 0(h>) + 0(eh*). (3.72) 

Remarks, a) If the A -stability assumption is dropped and the coefficient matrix A 
is only assumed to be invertible, then the estimates of Corollary 3.10 still hold for 
e < Kh where K is a method-dependent constant (see Remark 3.7). 

b) A -stability and the invertibility of the matrix A imply in general that the 
eigenvalues of A have positive real part. Otherwise the stability function would 
have to be reducible. 

c) For several Runge-Kutta methods satisfying a si — b i the estimate (3.71) for 
the y -component can be improved. E.g., for Radau IIA and for Lobatto IIIC one 
has y n - y(x n ) = 0(h p ) + 0(e 2 h q ). This follows from Table VII.4.1 below. 

d) A completely different proof of the estimates (3.71) is given by Nipp & 
Stoffer (1995). They show that the Runge-Kutta method, considered as a discrete 
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dynamical system, admits an attractive invariant manifold M h £ , which is close to 
the invariant manifold M e of the problem (3.1). Studying the closeness of the two 
manifolds, they obtain the error estimates (3.71) without considering e -expansions. 

e) The analogues of Theorem 3.8 and Corollary 3.10 for Rosenbrock methods 
are given in Hairer, Lubich & Roche (1989). 

f) Estimates for p — q are given in Exercise 3 below. 


Numerical Confirmation 

The estimates of Corollary 3.10 can be observed numerically. As an example of 
(3.1) we choose the van der Pol equation 

y' — z 

, „ (3.73) 

ez =(1 -y 2 )z-y 

with £ — 10~ 5 and initial values 

y( 0) = 2, z( 0) = -0.6666654321121172 (3.74) 

on the smooth solution (Exercise 2). 

Table 3.1 shows the methods of our experiment together with the theoretical 
error bounds. In Fig. 3.1 we have plotted the relative global error at x en j = 0.5 
as a function of the step size h , which was taken constant over the considered 
interval. The use of logarithmic scales in both directions makes the curves appear 
as straight lines of slope r , whenever the leading term of the global error behaves 
like Const-h r . The figures show complete agreement with our theoretical results. 


Table 3.1. Global errors predicted by Corollary 3.10 


Method 

®si — ^2 

y -comp. 

z -comp. 

Radau IA 

no 

h^-'+eh 3 

h s 

Radau IIA 

yes 

h 2s ~ 1 +e 2 h s 

h 2s ~ 1 +eh 3 

Lobatto IIIC 

yes 

h 2s ~ 2 +e 2 h 3 ~ 1 

/» 2 * -2 +eh* -1 

SDIRK (IV.6.16) 

yes 

h 4 +eh 2 

h 4 +eh 

SDIRK (IV.6.18) 

no 

h 4 +eh 2 

h 2 
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10" 4 10- 3 10- 2 10 - 


Radau IA 

O s = 2 
^ s = 3 
□ s = 4 


*eh 4 io-> : 


10- 4 10- 3 icr 2 10- 



Radau IIA 

0^=1 
s = 2 
□ s = 3 


Lobatto IIIC 
O s = 2 
<$» s = 3 
□ s = 4 


h 10- 4 10- 3 10- 2 10- 

O Method (IV.6.18) 

Method (IV.6.16) / 


10" 4 10-3 10" 2 10- 1 



10- 4 10-3 10- 2 10 




h 10" 4 10-3 10 -2 10 - 



y -component 


^ -component 


Fig. 3.1. Global error versus the step size 
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Perturbed Initial Values 

When integrating a singular perturbation problem, the numerical solution approx¬ 
imates the smooth solution only within the given tolerance Tol. It is therefore 
interesting to investigate the influence of perturbations in the initial values on the 
global and local errors of the method. Let us begin with a numerical experiment. 
We perturb the z(0) value of (3.74) by an amount of 10~ 6 and apply the Radau 
IIA methods to the problem (3.73). For the global error at x em i — 0.5 we obtain ex¬ 
actly the same results as in Fig. 3.1. This shows that the perturbation is completely 
damped out during integration. The results for the local error show a different be¬ 
haviour and are displayed in Fig. 3.2. We observe the presence of a “hump”, exactly 
as in Fig. IV.7.4 and in Fig. IV.8.2. 



In order to explain this phenomenon we denote by (y 0 ^z 0 ) the considered 
initial value, and by (y lJ z : ) the numerical solution after one step with step size 
h. The exact solution y(pc ), z(x ) passing through ( y 0 , z 0 ) will have a boundary 
layer, and (under suitable assumptions, see Theorem 3.2) can be written as 

y(x) = y(x) 4- 0(ee~ x / £ ), z(x) = z(x) + 0(e~ x ^ £ ). (3.75) 

Here y(x ), z(x) represents a smooth solution of (3.1). We denote by y 0 = y( 0), 
z 0 = z( 0) the initial values on this smooth solution, and by (y l5 z x ) the numerical 
approximation obtained by the same method with step size h and initial values 
(?7 0 , z 0 ). The local error can now be written as 

z x - z(h) = {z x - z x ) + (z ± - z(h)) + ( z(h) - z(h)) (3.76) 

and similarly for the y -component. The last term in (3.76), which is of size 
0(Tol • e ~ h / £ ), can be neglected if the step size h is significantly larger than e. 
The term z x — z(h) represents the local error in the “smooth” situation and is 
bounded by at least 0(/i 9+1 ) (apply Corollary 3.10 with n = 1). It can be ob¬ 
served in Fig. 3.2 whenever h or the error is large. The difference z 1 — z x is the 
term which causes the irregularity in Fig.3.2. Using Theorem 3.6 (with 5 = 0, 
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6 = 0, ff—rj = 0(e • Tol) , £ — C = 0(Tol )) and the ideas of the proof of Theo¬ 
rem 3.8 (in particular Eq. (3.62)) we obtain 

z i ~ z i = R (^9 z { fy){ z o- z o) + °{ £ - To1 ) 

2/i ~Vi = 0(e ■ Tol). 

For c < h we develop 

vM) = ^ (°°) + c^ 97 1 (0) + o ((0 2 ).. 

This shows that an h -independent expression R(oo)(z 0 — z 0 ) = 0(Tol) will be 
observed in the local error, if iJ(oo) 7^ 0. For methods with i?(oo) = 0 (such as 
Radau IIA) the dominant part in z 1 — z x is C(e/h)g z 1 (0)(z o -z 0 ) = 0(Tol• e/h). 
This term can be observed in Fig. 3.2 as a straight line of slope — 1. Thus in this 
region the local error increases like h _1 when h decreases. A similar perturbation, 
multiplied however by e , is observed for the y -component. 

This is not a serious drawback for a numerical implementation, because the 
phenomenon appears only for step sizes where the local error is smaller than Tol. 


(3.77) 


(3.78) 


Exercises 


1. Prove that the statement of Theorem 3.2 remains valid, if the assumption (3.11) 
is replaced by 

the eigenvalues A of g z (y,z) satisfy Re A < — 1 

for all y,z in a neighbourhood of the solution y 0 (x), z Q (x) of the reduced 
system. 

Hint. Split the interval into a finite number of small subintervals and construct 
for each of them an inner product norm such that, after a rescaling of e, (3.11) 
holds (see Nevanlinna 1976). 


2. Let y(0) = 2; find the corresponding 2(0) for the van der Pol equation (3.73), 
such that its solution is smooth. 


Result. 


2 10 292 2 

*(°> = -3 + 8l e -2T87 £ 


1814 . 
19683 £ 


+ 0(e 4 ). 


3. If the assumption q<p (p classical order, q stage order) is dropped in Corol¬ 
lary 3.10, we still have 

y n -y{ x n) = 0{h p ), z n — z{x n ) = 0{hP). 

Prove this statement. The implicit Euler method and the SIRK methods of 
Lemma IV.8.1 are typical examples with p — q . 

Hint. Apply Corollary 3.10 with q reduced by 1. 



VI.4 Rosenbrock Methods 


This section is devoted to the extension of Rosenbrock methods (see Sect. IV.7) to 
differential-algebraic equations in semi-explicit form 

y' = f(y, z ), y( x o) = yo (4.ia) 

o = g{y, z ), z i x o) = z o ■ (4-ib) 

We suppose that g z is invertible (see (1.7)), so that the problem is of index 1. 
We shall obtain new methods for the numerical solution of such problems, and 
at the same time get more insight into the behaviour of Rosenbrock methods for 
stiff differential equations. In particular, the phenomenon of Fig.IV.7.4 will be 
explained. 


Definition of the Method 


The main advantage of Rosenbrock methods over implicit Runge-Kutta methods 
is that nonlinear systems are completely avoided. The state space form method 
(transforming (4.1) to y f = /(y, G(y))) would destroy this advantage. This is one 
more reason for considering the e -embedding method. For the problem (1.5) a 
Rosenbrock method reads 


— h 


fi v i, w i) 


+ h 


e 1 9{v i ,w i ) 

2/o \ . (k 


/» 




/* 

:- 1 / 


3 = 1 


3 = 1 




2/i 


+£». 

i=l 


If we multiply the second line of (4.2) by e and then put e — 0 we obtain 


:h ()+h( f y f A( Vo ,z 0 )y ltj ( k A. 


(4.2) 


(4.3a) 


(4.3b) 


Formulas (4.3a) and (4.3b) together constitute the extension of a Rosenbrock method 
to the problem (4.1). This type of method was first considered by Michelsen (1976) 
(quoted by Feng, Holland & Gallun (1984)). Further studies are due to Roche 
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(1988). We remark that the computation of (&•,£■) from (4.3b) requires the solu¬ 
tion of a linear system with matrix 


/ I-lhf y -jhf z 
V -7 hg y -lhg z 


(4.4) 


where all derivatives are evaluated at (y 0 ,z 0 ). For nonsingular g z , nonzero 7, and 
small enough h > 0, this matrix is invertible. This can be seen by dividing the 
lower blocks by 7 h and then putting h = 0. 


Non-autonomous equations. If the functions / and g in (4.1) also depend on x, 
we replace (4.3b) by 


k i\- h (f( x o + a i h ’ v i’ w i)\, h (f v ( k A+h 

0)- h Uz o + a,h,v,,v,,)) +h {;„ g jL.->;{/ i )+ h 


7 . (h 


(4.5) 

(compare with (IV.7.4a) and recall the definition of a • and 7 i in (IV.7.5)). All 
derivatives are evaluated at the initial value (x 0 ,y 0 ,z 0 ). 


Problems of the form Mu' = ip(u). Rosenbrock formulas for these problems have 
been developed in Sect. IV.7 (Formula (IV.7.4b)) in the case of regular M. This 
formula is also applicable for singular M , and can be justified as follows: It is 
theoretically possible to apply the transformation (1.20) so that M becomes the 
block-diagonal matrix with entries / and 0. The method (IV.7.4b) is then identical 
to method (4.3). Therefore, the theory to be developed in this section will also be 
valid for Rosenbrock method (IV.7.4b) applied to index 1 problems of the form 
Mu' = <p(u ). 


Having introduced a new class of methods, we must study their order condi¬ 
tions. As usual, this is done by Taylor expansion of both the exact and the numerical 
solution (similar to Section II.2). A nice correspondence between the order condi¬ 
tions and certain rooted trees with two different kinds of vertices will be obtained 
(Roche 1988). 


Derivatives of the Exact Solution 


In contrast to Sect. II.2, where we used “hordes of indices” (see Dieudonne’s pre¬ 
face to his “Foundations of Modern Analysis”) to show us the way through the 
“woud met bomen” (Hundsdorfer), we here write higher derivatives as multilinear 
mappings. For example, the expression 


it dy i dZk 


Uj V 


k 


is written as g yz {u , v), 


which simplifies the subsequent formulas. 
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We differentiate (4.1b) to obtain 0 = g y • y' + g z - z f and, equivalently, 

z ={-g7 l )g y f- (4.6) 

We now differentiate successively (4.1a) and (4.6) with respect to x. We use the 
formula 

(-g7 1 )' u = (-g7 1 )(gzy((-g7 1 ) u J)+gzz{{-g7 1 )u,{-g7 1 )g y f)) (4.7) 

which is a consequence of (A -1 (x))' = — A -1 (x) A'(x) A -1 (x) and the chain rule. 
This gives 

y" = f v -y' + f z -z' = f y f + f z (~g7 1 )g y f (4-8) 

z " = (-g7 1 )(gzy((-g7 1 )g y fJ) + gzz{{-g7 1 )g y f,{-g7 1 )g y f)) 

+{-g7 1 ) ( g yy {f > /)+ g y z (/> (~g7 1 )g y f ) (4.9) 

+(-fl , z~ 1 k, (f y f+fA-gf^gyf) ■ 

Clearly, these expressions soon become very complicated and a graphical represen¬ 
tation of the terms in (4.8) and (4.9) is desirable. 


Trees and Elementary Differentials 

We shall identify each occuring / with a meagre vertex, and each of its deriva¬ 
tives with an upward leaving branch. The expression (— g7 1 )g is identified with 
a fat vertex. The derivatives of g therein are again indicated by upwards leaving 
branches. For example, the second expression of (4.8) and the first one of (4.9) 
correspond to the trees in Fig. 4.1. 

The above formulas for y f ,z f , y ff ,z" thus become 

y f = • z' — d 

y=/ } z " = {f‘Q'V\7 

The first and fourth expressions in (4.9) are identical, because g zy (u^v) — g yz (v , u ). 
This is in nice accordance with the fact that the corresponding trees are topologi¬ 
cally equivalent. The lowest vertex of a tree will be called its root. 

We see that derivatives of y are characterized by trees with a meagre root. 
These trees will be denoted by t or t ■, the tree consisting only of the root (for y ') 
being r y . Derivatives of z have trees with a fat root. These will be written as u or 
u j, the tree for z' being r z . 
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Fig. 4.1. Graphical representation of elementary differentials 


Definition 4.1. Let DAT = DAT y U DAT z denote the set of (differential algebraic 
rooted) trees defined recursively by 

a) T y £ DATy , £ DAT z ; 

b) [t 1 ,...,t m ,u 1 ,...u n ] y £DAT y 

if t 1 ,..., t m e DAT y and u 1 ,...u n E DAT z ; 

c) [t 1 ,...,t m ,u 1 ,...,u n } z £DAT z 

if f i,..., f m eDATy, u n e DAT z , and (m,n) ± (0,1). 

Here [t t ,... ,t m ,u 1 ,... ,u n ] y and [t ^..., t m ,u 1 ,..., u n \ z represent unordered 
(m + n) -tuples. 

The graphical representation of these trees is as follows: if we connect the 
roots of 1 1 ,..., t m , u x ,..., u n by m + n branches to a new meagre vertex (the 
new root) we obtain [t x ,..., t m , u 1 ,..., u n ] ; if we connect them to a new fat 
vertex we obtain [t x ,..., t m , u x ,..., u n ] z . For example, the two trees of Fig. 4.1 
can be written as [r z ] y and [t z , r y ] z . 

Definition 4.2. The order of a tree t £ DAT y or u £ DAT z , denoted by g(t ) or 
g(u ), is the number of its meagre vertices. 

We see in (4.10) that this definition of order coincides with the derivative order 
of y (*) or zW as far as they are computed there. 

We next give a recursive definition of the one-to-one correspondence between 
the trees in (4.10) and the expressions in (4.8) and (4.9). 

Definition 4.3. The elementary differentials F(t) (or F(u )) corresponding to trees 
in DAT are defined as follows: 

a ) F ( T y) = f’ F ( T z) = 

b) F(t) = F(tJ, F( Ul ),F(uj) 

if t = [t 1 ,...,t m ,u 1 ,..., u n ) y e DATy, 

c) F(u) =.(-g z )-i (pfr),.. F( Ul ),F(u n )) 

if «= [ti ,'••• ,f m , u 1 ,...,u n ] z eDAT z . 

Because of the symmetry of partial derivatives, this definition is unaffected by a 
permutation of t x ,..., t m , u 1? ..., u n and therefore the functions F(t) and F(u) 
are well defined. 
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Taylor Expansion of the Exact Solution 

In order to get more insight into the process of (4.8) and (4.9) we study the differ¬ 
entiation of an elementary differential with respect to x . By Leibniz’ rule the dif¬ 
ferentiation of F(t) (or F(u) ) gives a sum of new elementary differentials which 
are obtained by the following four rules: 

i) attach to each vertex a branch with r y (derivative of / or g with respect to y 
and addition of the factor y f = /); 

ii) attach to each vertex a branch with r z (derivative of / or g with respect to z 
and addition of the factor z' = (— g7 1 )g y f)\ 

iii) split each fat vertex into two new fat vertices (linked by a new branch) and 
attach to the lower of these fat vertices a branch with r y ; 

iv) as in (iii) split each fat vertex into two new fat vertices, but attach this time to 
the lower of the new fat vertices a branch with r z . 

The rules (iii) and (iv) correspond to the differentiation of (— gj 1 ) and follow at 
once from (4.7). We observe that the differentiation of a tree of order q (or, more 
precisely, of its corresponding elementary differential) generates trees of order 
<? + 1 • 

As was the case in Sect. II.2, some of these trees appear several times in the 
derivative (as the first and fourth tree for z" in (4.10)). In order to distinguish all 
these trees, we indicate the order of generation of the meagre vertices by labels. 
This is demonstrated, for the first derivatives of y , in Fig. 4.2. Since in the above 
differentiation process the new meagre vertex is always an end-vertex of the tree, 
the labelling thus obtained is necessarily increasing from the root upwards along 
each branch. 



Fig. 4.2. Monotonically labelled trees ( LDAT y ) 


Definition 4.4. A tree t £ DAT y (or u £ DAT z ) together with a monotonic labelling 
of its meagre vertices is called a monotonically labelled tree. The sets of all such 
monotonically labelled trees are denoted by LDAT , LDAT z and LDAT . 

Definition 4.2 (order of a tree) and Definition 4.3 (elementary differential) are 
extended in a natural way to monotonically labelled trees. We can therefore write 
the derivatives of the exact solution as follows: 
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Theorem 4.5 (Roche 1988). For the exact solution of (4.1) we have: 

y (q) i x 0 )= F ( t )(yo> z o)= a (t) F (t)(ycn z o) 

t(zLDAT y ,g(t)=q t(EDAT y ,Q(t) = q 

z iq) ( x o)= F ( u ){yo’ z o)= a ( u ) i? ( u )(%> 2 o)- 

u(zLDAT z ,g(u) = q u(zDAT z ,g(u) = q 

The integer coefficients a(t) and a(u) indicate the number of possible monotonic 
labellings of a tree. 

Proof. For q— 1 and q = 2 this is just (4.1a), (4.6), (4.8) and (4.9). For general 
q the above differentiation process of trees generates all elements of LDAT , each 
element exactly once. If the sum is taken over DAT y and DAT z , the factors a(t) 
and a(u) must be added. □ 


Taylor Expansion of the Numerical Solution 

Our next aim is to prove an analogue of Theorem 4.5 for the numerical solution 
of a Rosenbrock method. We consider y l , z x as functions of the step size h and 
compute their derivatives. From (4.3a) it follows that 

y ( Ao)=± b A q) m, 4 ,) (o)=Em! ,) (o)- (4-ID 

2 = 1 2=1 

Consequently we have to compute the derivatives of k { and t •. This is done as for 
Runge-Kutta methods (Sect. II.2) or for Rosenbrock methods applied to ordinary 
differential equations (Sect. IV.7). 

We differentiate the first line of (4.3b) with respect to h. Using Leibniz’ rule 
(II.2.4) this yields for h = 0 

k i q) =9(/K» u, ,)) (?_1> + (/j/)o4E^7i^ ?-1) + ( 4 42) 

j -1 3 = 1 

The index 0 in (f y ) Q and (f z ) Q indicates that the derivatives are evaluated at 
{y 0 , z 0 )- The second line of (4.3b) is divided by h before differentiation. This 
gives (again for h = 0) 

o = (9{vi,Wi)) iq) + {g y ) 0 ^7ijk\ q) + (g z ) 0 'f2'fi j t ( f ) ■ (4.13) 

3 =1 3 = 1 

The derivatives of / and g can be computed by Faa di Bruno’s formula (Lem¬ 
ma II.2.8). This yields 
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where the sum is over all “special LDAT y ’s” of order q . These are monotonically 
labelled trees [t 1? ..., t m , it 1? ..., u n \ y where t- and u- do not have any ramifica¬ 
tion and all their vertices are meagre with the exception of the roots of u 1 ,..., u n . 
The integers /i • and v- are the orders of t- and u -, respectively. They satisfy 
ji 1 -[-...+ iJL m + v 1 + ... + v n = q — 1. Similarly we apply Faa di Bruno’s formula 
to g and obtain 




+ 9 z (v i ,w i )w\ q) . 


(4.15) 


Here the sum is over all “special LDAT z ’s” of order q . They are defined as above 
but have a fat vertex. The integers , v- satisfy fi x +... + (i rn + v x +... + v n — q . 
The term with g z is written separately, because (by the definition of LDAT z ) [u 1 \ z 
is not an admissible tree. 

We are now in a position to compute the derivatives of k t and £ i . For this it is 
convenient to introduce the notation 


Pij = a ij+'Tij ( 4 - 16 ) 

(with a • • = 0) as in (IV.7.12). We also need the inverse of the matrix (/^ •), whose 
coefficients we denote by u i -: 


{<^ij) = (pij) 1 - 


(4.17) 


Theorem 4.6. The derivatives of k i and £ i satisfy 

^ ?) (o)= E 7mi(mt)(vo,* 0 ) 

t £ LDATy , g ( t ) = q 

u£LDAT z : g(u) = q 


(4.18) 


where the coefficients $•(£) and <3>-(t£) are given by <3>-(t ) = 1, ^ 2 (r z ) — 1 and 


E 


<y . • • • ry. ■ Ct ■ ■••QE 

ip i i^m ivi i 


Ht) = { 


Ml 

Ewu 

3 

E&; $ i( u i) 

V 3 


l) • • • 

ift = [t 1 ,...,t rn ,u 1 ,...,u n ] y cmdm + n>2 

= i] y 

ift = Wi L> 
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E 


t o ■ (x ■ • • • rv . rv. 

UJ/^l JUm JVl 


• • a 


J V n 




*i(«) = < 


K(*i) 


• $ pi( 4 l) ' ’ ' ( U l) ‘ ‘ ' $ *„( U n) 

{f tl = [<iv,< m ,Mir--,\] 2 a^m + n>2 

= [*iL 


and the integer coefficients 7 (t) and 7 (it) are defined by j{r y ) — 1 , 7 ( 77 ) = 1 
and 


i(t) = eithih)■ ■ • 7(* m bK) • • • 7(«„) = [*i,• • •, t m , «i,..., uj y 

7 (u) = 7(b)... 7 (i m ) 7(«i) • • • 7(u„) ifu = [<i, • • • • • •, «„]*• 


Proof. By (4.3a) we have 


4" I = I>„7‘ ) . 

j=l 


u; 


H _ 


z — 1 


Evr- 


j =1 


(4.19) 


We now insert (4.19) into (4.14) and the resulting formula for (f(v i ,w i ))( q ~ 1 ') into 
(4.12). This yields (all expressions have to be evaluated at h — 0) 


M 9) = ? e 


dy m dz n 


(?-i) 


+«(/,)oE^?' 1) +«(/,)oE¥. 

i=i i=i 

The same analysis for the second component leads to 

0= V d m+n ff bo ? g °) fy* k (n) 

u dv rn dz n [2-^ IJ j ’• 

m+n>2 y V j'=l j=l 

+(^)o E^- fc S ?) +( sJoEM ?) - 

j=i j=i 


(4.20) 


(4.21) 


The sums in (4.20) and (4.21) are over elements of LDAT exactly as in (4.14) and 
(4.15). Equation (4.21) allows us to extract if we use the inverse of 
This gives 


(?) 


= (-5z)o 1 E w O- E 

j = 1 m+n>2 


d m+n g{y 0 ,z 0 ) 

dy m dz n 


3 -1 


i-i 


Eb^.-.E^" 0 - 


K =1 


K =1 


+ ((-5j 1 )<7 y )o^ ?) - 


(4.22) 


The proof of Formula (4.18) is now by induction on q. The case q = 1 follows 
immediately from (4.12) and (4.13). For general q , we insert the induction hypoth¬ 
esis into (4.20) and (4.22), exploit the multilinearity of the derivatives, and arrange 
the summations as in the proof of Theorem II.2.11. □ 
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Finally, Eq. (4.11) yields the derivatives of the numerical solution. 

Theorem 4.7 (Roche 1988). The numerical solution of (4.3) satisfies: 

s 

y ( i q) \h=o= £ 7 (*)£^(W)(%^o) 

ttzLDATy ,g(t) = q *=1 

s 

4 9) lfc=o= £ -i( u )Y; b Ai( u ) F ( u )(yoT z o) 

u£LDAT z ,g(u) = q *=1 

where the coefficients 7 and are given in Theorem 4.6. □ 


Order Conditions 


Comparing Theorem 4.5 and 4.7 we obtain 

Theorem 4.8. For the Rosenbrock method (4.3) we have: 
y(x 0 + h) - y 1 = 0(h p+1 ) iff 
3 1 

£ & < $ ,-W = ^y for t€DAT y , e(t)<p-, 
z(x 0 + h) - z 1 = 0(h q+1 ) iff 


£ fc i $ i( u ) = 


i=l 


7(1 


for u G DAT z , q(u) < q , 


where the coefficients $ ■ and 7 are those of Theorem 4.6. 


□ 


Repeated application of the recursive definition of in Theorem 4.6 yields 
the following algorithm: 

Forming the Order Condition for a Given Tree: attach to each meagre vertex one 
summation index, and to each fat vertex two indices (one above the other). Then 
the left hand side of the order condition is a sum over all indices of a product with 
factors 

b ■ if “z” is the index of the root (the lower index if the root is fat); 
a- if “j” lies directly above “z” and “z” is multiply branched; 

Pij if “j” lies directly above “z” and “z” is singly branched; 

(jj i - if “z, j” are the two indices of a fat vertex (“z” below “j”). 
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As an example, we present the order conditions for the first two trees of Fig. 4.3. 

Y b t a ij a ik UJ kef 3 e m = \ (4-23) 

Y b i w ij a jk a je w e m a mn a mp = (4-24) 

i,j,k,£,m,n,p 

The condition (4.23) can be further simplified if we use the fact that (a;-) is the 
inverse of the matrix ((3 -). Indeed, (4.23) is equivalent to 

, _ 1 
"i^ij^ik q 

i,j,k 

which is the order condition for the third tree in Fig. 4.3. Exploiting this reduction 
systematically we arrive at the following result. 

Lemma 4.9. For a Rosenbrock method (43) the order conditions corresponding to 
one of the following situations are redundant: 

a) a fat vertex is singly branched . 

b) a singly branched vertex is followed by a fat vertex. □ 



The subset of DAT y which consists of trees with only meagre vertices, is sim¬ 
ply T (the set of trees of Sect. II.2). The corresponding order conditions are those 
given in Sect. IV.7. Consequently, a p-th order Rosenbrock method has to satisfy 
all “classical” order conditions and, in addition, several “algebraic” conditions. 
The first of these new order conditions are given in Table 4.1. We have included 
the polynomial p t ( 7 ) in its last column, which is the right-hand side of the order 
condition, when written in the form (IV.7.11’). 


Convergence 

Before we proceed to the actual construction of a new Rosenbrock method, we still 
have to study its convergence property. The following result will also involve 

R{ 00 ) = 1 - b T B~ 1 1 = 1 -^ (4.25) 

hJ 

where R(z) is the stability function (IV.7.14). 
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Table 4.1. Trees and elementary differentials 


Q(t ) 

t 

graph 

7(0 

•Mo 

pth) 

4 

5 


4 

T. ® jk® j£^£m®mn®mp 

1/4 

2 

«21 

V 

1 

^ jk® k£® km 

1 

3 

^31 


1 

^ jk® k£® km® kn 

1 

3 

^32 

V 1 v / 

2 

Yj UJ jk®k£®kmPmn 

1/2-7 

3 

^33 

•y 

1 

T, u, jk®k£®km uj rnn®np®nq 

1 


We denote the local error of the Rosenbrock method (4.3) by 

hh( x ) = Vi -y( x + h), $z h (x) = z 1 — z(x + h). (4.26) 

Here y 1 , z 1 is the numerical solution obtained with the exact initial values y 0 = 
y(x), z 0 = z{ X ). 

Theorem 4.10. Suppose that g z is regular in a neighbourhood of the solution 
(y(x), z(x)) of (4.1) and that the initial values (y 0 ,z 0 ) are consistent. If the sta¬ 
bility function is such that | JJ(oo)| < 1 , and the local error satisfies 

Sy h (x) = <5(^ +1 ), 5z h (x) = 0(h>), (4.27) 

then the Rosenbrock method (4.3) is convergent of order p; i.e., 

y n -y( x n) = °(. hP )» z n~ z ( x n ) = 0{h p ) for x n -x 0 =nh< Const. 


Proof Since g z is regular we have 

Ils7 1 (y> 2 0s(y,z)ll < <* ( 4 - 28 ) 

for (y,z) in a compact neighbourhood U of the solution. The h -independent value 
of 5 can be made arbitrarily small by shrinking U. We also suppose for the moment 
that the numerical solution and all its internal stages remain in this neighbourhood. 
The propagation of local errors will be studied in part (a), and their accumulation 
over the whole interval in part (b). 

a) We consider two pairs of initial values, (y 0: z 0 ) and (y 0 , zf ), and apply the 
method to each (these values may be inconsistent, but they are assumed to lie in 
U). We shall prove that 

hi-yA < {^- + hL )\\yo-yo\\ + hM \\ z o- z o\\ (429) 

Ik ~ z i II < Nllyo-yoll + ^lk-^oll 


where k < 1. For this we fix a sufficiently small step size h , and consider y 1 ^z 1 . 
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k i ,£ i as functions of (y 0 , z 0 ). We shall show that 

= '+o(h), | y f = om, 

a° (430) 

The mean value theorem then implies (4.29). 

We first estimate k • and i •, defined in (4.3b). Using (4.28) we compute £ • 
from the second line and insert it into the first one. This yields successively k i — 
0(h ) and i { — 0(h + 5) for all internal stages. We then differentiate (4.3b) once 
with respect to y Q and once with respect to z 0 . An analysis similar to that for k • 
and yields 


dk t 

dy 0 

dk 

dy 0 


= o(h), 

= 0 ( 1 ), 


d £r° {h) 

= ~~ X! “v 1 + °(h + s) 

az ° j 


(4.31) 


and the estimates (4.30) follow from (4.3a) and (4.25). 

b) As a consequence of Lemma 3.9 (see Exercise 8), the propagation of the 
local errors Sz^x-^) to the solution at x n can be bounded by 

CiWSy.ix^W + ih + ^Wz.ix^W). (4.32) 

Summing up these terms from j — 1 to j = n and using (4.27) gives the stated 
bounds for the global error, because ) < Const. 

Our assumption that the numerical solution and the internal stages lie in U can 
now easily be justified by induction on the step number. The numerical solution 
remains G(h p )- close to the exact solution and thus remains in U for sufficiently 
small h. This implies g{y-,z-) = 0(h p ) for all j and hence also = 0(h). 
Consequently (v t , w •) are also as close to the exact solution as we want. □ 


Stiffly Accurate Rosenbrock Methods 

We have already had several occasions to admire the beneficial effect of stiffly 
accurate Runge-Kutta methods (methods with a si — b t for all i ; see Theorem 1.1 
and Corollary 3.10). What is the corresponding condition for Rosenbrock methods? 

Definition 4.11. A Rosenbrock method is called stiffly accurate , if 

a si+lsi = h l (i = l,...,s) and a s = 1. (4.33) 

Recall that a • = ^ a- . It has already been remarked at the end of Sect. IV. 15 
that methods satisfying (4.33) yield asymptotically exact results for the problem 
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y f = X(y — p(x)) + p f (x) . A further interesting interpretation of this condition has 
been given by C. Schneider (1991). He argues that DAE’s are combinations of 
differential equations and algebraic equations; hence methods should be equally 
valuable for both extreme cases, either a purely differential equation, or a purely 
algebraic equation 

x — 1, 0 = g(x,z), g z invertible. (4.34) 

Proposition 4.12. A stiffly accurate Rosenbrock method, applied to (4.34), yields 
z \ = w s -97 1 ( x oi z o)-g( x o + h i w s)- 

The numerical solution z 1 is thus the result of one simplified Newton iteration for 
0 = g(x 0 + h, z) (with starting value w 3 ). 

Proof Condition (4.33) together with b t = 1 implies that 7 5 = Ylj l 3 j = 0* 
Therefore, the second line of (4.5) gives (observe that k t — h for the problem 
(4.34)) 

i 

0 = g(x 0 + h, W s )+g z (x 0 ,z 0 ) ^ 7.7 • 

i=i 

Inserting the expression thus obtained for ^ into 

s s 

j— 1 3 — 1 

proves the statement. □ 


The values (u s , w s ) of the last stage are often used as an embedded solution 
for step size control. If this is the case for a stiffly accurate method, then many of 
the algebraic order condition are automatically satisfied. This is a consequence of 
the following result. 

Proposition 4.13. Consider a stiffly accurate Rosenbrock method. For sufficiently 
regular problems (4.1) we have 

z 1 — z(x 0 + h) = G{h qJrl ) (4.35) 

if and only if 

v s — y(x 0 + h) ~ 0(h q ) and w s — z(x 0 + h) = 0(h q ). (4.36) 

Proof. We use the characterization of Theorem 4.8 and the fact that (with un ¬ 
defined in (4.17)) 

53 = { 0 else . 


(4.37) 
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Suppose first that (4.35) holds. For a tree u = [r y ,t 2 ] z with arbitrary t 2 G DAT y 
we have, by definition of $ -(u) and 7 (u), 

X = X = X a s* $ *(*2) (4-38) 

i i A: 

and 7 (u) = 7 (^ 2 ). Consequently, the order condition is satisfied for u iff it is 
satisfied for t 2 . Since g(t 2 ) = g(u) — 1, we see that v s — y(x Q +h) = 0(hv) is a 
consequence of (4.35). By considering u = [r y ,u 1 ] z with u x G DAT z we deduce 
the second relation of (4.36). The “if” part is proved in a similar way. □ 


Finally we remark that because of (4.25) and (4.37) the stability function of a 
stiffly accurate Rosenbrock method always satisfies R( 00 ) = 0. This is a desirable 
property when solving stiff or differential algebraic equations. 


Construction of RODAS, a Stiffly Accurate Embedded Method 


We want to construct an embedded Rosenbrock method (where y 1 =v 3 ,z 1 = w s ), 
such that both methods are stiffly accurate. This gives the following conditions 


b i=Psi (* = !>••• > s )> a a = l 

b i= a si=Ps-i,i = a*-i=l 


(4.39) 


(as usual (3- = a - + 7 ^). It follows from Proposition 4.12 that the last two stages 
represent simplified Newton iterations. Further, both methods have a stability func¬ 
tion which vanishes at infinity. The construction of such a method of order 4(3) 
seems to be impossible with 5 = 5. We therefore put 5 = 6 . 

Here is the list of order conditions which have to be solved. We use the abbre¬ 
viations a-,/?' defined in (IV.7.16), and the coefficients cu- from (4.17). We shall 
require that 


V\ -y{x 0 + h) = 0{h 5 ), 

Vi -y{ x o + h ) = o(h 4 ). 

(4.40) 

Since we have sufficiently many parameters we also require 


V S _1 -y(x 0 +h) = 0(h 3 ), 

w 3 _! -z(x 0 +h) = 0(h 3 ). 

(4.41) 

By Proposition 4.13 this implies 

z x - z(x 0 + h) = 0(h 4 ), 

z 1 - z(x 0 + h) = 0(h 5 ), 

(4.42) 


which is more than sufficient to ensure convergence of order 4 (see Theorem 4.10). 
The conditions for (4.40) and (4.41) are (see Table IV.7.1 and Table 4.1) 


+ 6 2 + 63 + 64 + (65 + 6 6 ) = 1 

b 2 ( 3 2 + 3 + 64/^4 + (6 5 + & 6 )(1 - 7) = \ ~ 7 

^2 a 2 + ^3 a 3 + b±a\ + (65 + 6 6 ) = I 


(4.43a) 

(4.43b) 

(4.43c) 
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^3^32^2+^42 ^ 4 iA'+ (^5 +^)(2 _ 27+7 2 ) = | —7 + 7 2 (4.43d) 

b 2 a\ + b 3 a\ + & 4 af + (b 5 + b 6 ) = \ (4.43e) 

^3 a 3 a 32^2 + ^4 a 4 2 a 4i P'i + (^5 + ^6 ) ( \ ~ l) ~ 8 — 3 (4.43f) 

hP^l + hY.' Pan] + ( & 5+ 6 e)(|-7) = n~ | (4.43g) 

hP^PziPi + Os + b e)(h - §7 + 37 2 -7 3 ) = ^ - | + §7 2 -7 3 (4.43h) 

b 3 a 3 a 32 io 22 al + b 4 a 4 2jj <* 4 i u ii a ) + Os + h) = i ( 4 - 43i ) 

a 62^2 + a 63^3 + a 64^4 = \ ~ ^7 + 7 2 (4.43j) 

a 6 2«2 + “es^s + a 64 a 4 = | - 7 (4.43k) 

«63«2 + «64 2'« = 4 - §7 + 37 2 - 7 3 (4- 431 ) 

a 52@2 + a 53 p 3 + a 54/^4 = 2 — 7 (4.43m) 

2i=i a 5i 2J=i = 1 (4. 43n ) 


In order to solve the system (4.39), (4.43a-n) we can take 7 , a 2 , a 3 , a 4 , (3 2 — 
0 21 , [3 3 , /? 4 as free parameters. The remaining coefficients can then be computed 
as follows: 

Step 1. We have b 6 = 7 by (4.39). The remaining 6 - can be chosen such that 
(4.43a,b,c,e) are satisfied. We have one degree of freedom which can be exploited 
to fulfill the additional order condition S; — 1/5 • This step also yields (3 6i = 
6 - for i = 1 ,..., 6 . 

Step 2. Compute the two expressions b 3 (3 32 + 6 4 /? 42 and b 4 (3 43 from (4.43d,g), 
and then (3 32 from (4.43h). Because of (3- = Sj=i Pij this determines all (3- with 
i < 4. Observe that (3- = 7 for all i . 

Step 3. Solve the linear system (4.43j,k,l) fora 62 , a 63 , a 64 . We have a 65 = 7 
by (4.39) and compute a 61 from a 6 = Si a 6 i — 1 • This also yields (3 hi = a 6 • by 
(4.39). Hence all (3^ and u ) i -, and also 6 ■ = (3 5i (i = 1,..., 5) are determined at 
this stage. 

Step 4. The conditions (4.43m,n) and a 5 = 1 constitute 3 linear equations in 
the four unknown parameters a 51 , a 52 , a 53 , a 54 . We have one degree of freedom 
in this step. 

Step 5. The remaining two conditions (4.43f,i) are linear equations in a 32 , 
a 42 , a 43 . We have one more degree of freedom which can be exploited to fulfill the 
order condition for the tree [r y , r y , [T y \ y \ y . The values of a- x are then determined 

by Q-i = 2}=1 a ij > and those of 7 ij are given by ^ - a {j . 

The coefficients for the code RODAS of the appendix were computed with the 
above procedure. In step 4 we have added the condition 

E“5^0- = 1 (4 ' 44) 

h3 


which will be explained in Exercise 3 below. The free parameters were chosen in 
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order to get an A-stable method with small error constants. The result is 
7 = 0.25 

a 2 = 0.386 a 3 = 0.21 a 4 = 0.63 (4.45) 

P' 2 = 0.0317 /?' = 0.0635 /?'= 0.3438 

We do not claim that these values are optimal. Nevertheless, the numerical results 
of Sect. IV. 10 (Fig. IV. 10.8, IV. 10.9 and IV. 10.12) are encouraging. Although the 
new method needs 6 function evaluations per step, it is in general superior to the 
classical methods of Table IV.7.2 which need only 3 evaluations per step. 

A different set of coefficients, based on the same construction, has been pro¬ 
posed by Steinebach (1995). The free parameters are chosen in order to satisfy the 
Scholz conditions C 2 (z) = 0 and C 3 (z) = 0 (see Eq. (15.41) of Sect. IV. 15). 


Dense Output. A natural way to define a continuous numerical solution for y(x Q + 
Oh ), z(x 0 + Oh) is 

s s 

yA 9 ) = y o + 'Yl b i( 0 ) k ii z i(6) = z o + '%2 b i( 6 ) i i> ( 4 - 46 ) 

2=1 2=1 


where the 6 -($) are polynomials which satisfy 6-(0) = 0, 6 -(l) = b i . In complete 
analogy to Theorem 4.8 we have 


_L_ ae(t) 

y(x 0 +9h)-y 1 (9) = O(h>+ 1 ) iff = 

for t e DAT y , g(t) < p, 


z{x 0 +9h)-z 1 {0)=O{h (l+1 ) iff 


for u G DATg(u) < q. 


(4.47) 


In our situation (s = 6 ) it is easy to fulfill these conditions with p = 3 and q = 2. 
The additional condition b s (0) = 7 0 makes the solution unique. 


Methods of Order 5. C. Schneider (1991b) first constructed stiffly accurate Rosen- 
brock methods of order 5 with s = 8 stages. Di Marzo (1992) then determined 
carefully the free parameters to obtain A-stability and small error constants. The 
resulting method, implemented in the code RODAS5, gives excellent results (see 
Sect. IV. 10). 


Inconsistent Initial Values 

Even if we start the computation with consistent initial values, the numerical solu¬ 
tion (y n ,z n ) of a Rosenbrock method does not, in general, satisfy g(y n ^ z n ) = 0. 
It is therefore of interest to investigate the local error also for inconsistent initial 
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values (y 0 , z 0 ). But what is the local error? To which solution of (4.1) should we 
compare the numerical values? If 

ll(S'r 1 S')(%^o)ll <$ (4.48) 

with sufficiently small 5, we can find (because of (1.7)) a locally unique z 0 which 
satisfies g(y 0 , z 0 ) = 0. It is natural to compare the numerical solution (y 1 ,z 1 ) to 
that solution of (4.1) which passes through (y 0 ,z 0 ). 

Our first aim is to write this solution in terms of elementary differentials eval¬ 
uated at (y 0 ,z 0 ). Using 

z 0 - z o = {-97 l 9){y 0 ^ 0 ) + 0(S 2 ), 

which is a consequence of 0 = g(y 0 ,z 0 ) + g z {y 0 ,z 0 )(z 0 — z 0 ) + ..., we get 

y{x 0 + h) = y 0 + hf(y 0 ,z 0 ) + O{h 2 ) ( 4 . 49 ) 

= Vo + hf(y o, z 0 ) + Hfz(~9z 1 )9){yo’ z o) + ®{h 2 + hS 2 ) 
z(x 0 + h) =z 0 + h(-gJ 1 g y f)(y 0 ,? 0 ) + O(h 2 ) ( 4 . 50 ) 

= z o + (~97 1 9)(y 0 ^ z o) + h (-97 1 9 y f)(y 0 , z 0 ) 

+ h(-g7 1 g zz {-g7 1 g,-g7 1 9yf))(y 0 i z o) 

+ H-97 1 9 yz (f, - 97 1 9))(y 0 i z o) 

+ h{—g z 1 g y f z {—g z 1 )<?)(%> 2 o) + 0(h 2 + 5 2 ) 

The expressions so obtained allow a nice interpretation using trees. We only have 
to add in the recursive Definition 4.1 a tree of order 0, which consists of a fat root. 
We denote this tree by 0 Z , and extend Definition 4.3 by setting F($ z )(y,z) = 
(—g7 1 g){y, z ). Then, the expressions of (4.49) and (4.50) correspond to the trees 
of Fig. 4.4. 


. / o / V v ) 

Fig. 4.4. Trees, to be considered for inconsistent initial values 

The numerical solution also possesses an expansion of the form (4.49), (4.50) 
with additional method-dependent coefficients. The first few terms are as follows: 

Vl =9 o+(Y 1 b i) h f(y 0 ^ Z o)+ (Yl b if 3 ij L0 .lk) h (fz(-97 1 )9)(yo^ Z o) 

i i,j,k 0 0 

+ 0{h 2 +h5 2 ) 

z i^ z o+ (^2 b i^ij)(~97 1 9)(y 0 ^ z o) + °( h + h 2 )- 

hj 

In order to understand the form of these new coefficients we have to extend the 
proof of Theorem 4.6. It turns out that the elementary differentials are multiplied 
by 7 (f) or l{ u ) > where 7 and are defined by 7 ( 0 J = 
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1, $i(0 J = 10 ij anc * the recursion of Theorem 4.6. Equating the coefficients 

of the exact and numerical solutions yields new order conditions for the case of 
inconsistent initial values. The first of these (to be added to those of Table IV.7.1 
and Table 4.1) are presented in Table 4.2. 


Table 4.2. Order conditions for inconsistent initial values 


tree 

order condition 

size of error term 

V 

^ i&ijMjk — 1/2 

0(h 2 5) 

0 

bi^ij — 1 

0(8) 

V 

bjU)j jOt jOt jk^ki — 1 

0(h8) 


Remarks, a) The first condition of Table 4.2 is exactly the same as that found by 
van Veldhuizen (1984) in a different context. It implies that the local error of the 
y -component is of size 0{h pJrl + h 3 S + hS 2 ). 

b) Condition ^i^ij — 1 means that the stability function satisfies iJ(oo) = 
0. Unless this condition is satisfied, the local error of the 2 -component contains an 
h -independent term of size S (which usually is near to Tol ). This was observed 
numerically in Fig. IV.7.4 and explains the phenomenon of Fig. IV.7.3. 

c) For Rosenbrock methods which satisfy (4.39), the second and third condi¬ 
tions of Table 4.2 are automatically fulfilled. For such methods the local error of 
the z-component is of size 0{h qJrl + h 2 5 + 5 2 ). 


Exercises 

1. (Roche 1989). Consider the implicit Runge-Kutta method (1.11) applied to 

( 1 . 6 ). 

a) Prove that z 1 — z(x 0 + h) — 0(h qJtl ) iff 

5 1 

^2 b i^i( u ) = ~rx for “ eDAr z. 

*=1 7W 

where 7 ( 11 ) and $-(u) are defined as in Theorem 4.6, but all coefficients 
and P- are replaced by the Runge-Kutta coefficients a- . 

b) Show that those trees in DAT z which have more than one fat vertex, are 
redundant. 


2. The simplifying assumptions (4.39) imply that many of the (algebraic) order 
conditions are automatically satisfied. Characterize the corresponding trees. 
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3. State the order condition for the tree [r y , [r , 0 Z ] J z . 

a) Show that the corresponding error term is of size 0(h 2 5) with S given in 
(4.48). 

b) For methods satisfying (4.39), this condition is equivalent to (4.44). 


4. (Ostermann 1990). Suppose that the Rosenbrock method (4.3) satisfies (4.27). 
Define polynomials 6-($) of degree q = [(p + l)/2] by 6-(0) = 0, 6-(l) = 6-, 
and 



+ 7 ji) 


if l—\ 

if £ = 2,...,q-l. 


Prove that the error of the dense output formulas (4.46) is 0(h^ +1 ). 
Hint. Extend the ideas of Exercise II. 17.5 to Rosenbrock methods. 


5. Suppose that a Rosenbrock method is implemented in the form (IV.7.25). If it 
satisfies (4.39), then its last two stages allow a very simple implementation 

Hint. Prove that 


m- 


i — 1 ,..., 5 — 1 
i — s , 


X S-1,2 


i = 1 ,..., s — 2 
i = s — 1 . 


6. Partitioned Rosenbrock methods (Rentrop, Roche & Steinebach 1989). Con¬ 
sider the method (4.3) with f y and f z replaced by 0. Derive necessary and 
sufficient conditions that it be of order p . 

Remark. Case (a) of Lemma 4.9 remains valid in this new situation. However, 
the trees of Lemma 4.9b give rise to new conditions. 

7. What is the “algebraic order” of the classical 4th order Rosenbrock methods of 
Section IV.7? 


8. Let {u n }, {v n } be two sequences of non-negative numbers satisfying (com¬ 
ponentwise) 


(u n+ i\ (1 + hL hM 

U+J-v N - 


u 

V 


with 0 < k < 1 and positive constants L,M, N. Prove that for h < h 0 and 
nh < Const 


u n <C{u 0 + hv 0 ), v n <C(u 0 + (h + K n )v 0 ). 


Hint. Apply Lemma 3.9 with e = h and M — 0. 
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The numerical computations of Sect. IV. 10 have revealed the extrapolation code 
SEULEX as one of the best method for very stringent tolerances. The aim of the 
present section is to justify theoretically the underlying numerical method, the ex¬ 
trapolated linearly implicit Euler method, for singular perturbation problems as a 
representative of stiff equations. 


Linearly Implicit Euler Discretization 


The linearly implicit Euler method (IV.9.25) applied to the singular perturbation 
problem (1.5) reads 


/ i-hf y { o) -hf z (o)\ (y l+ i -y t \ = h ( f(y t , z t ) 

V -hg y { 0) el-hg z (0)j\z i+1 -zj 


(5.1) 


Here we have used abbreviations such as f y (0) = / y (y 0 ,z 0 ) f° r Partial deriva¬ 
tives. We recall that the numerical approximations at x 0 + H (H = nh) are ex¬ 
trapolated according to (IV.9.26). 

For the differential algebraic problem (1.6) we just put e = 0 in (5.1). This 
yields 


(l-hf y { 0) -hf z { 0) 

V - h 9yl°) ~ h 9zi 0) 


Vi+i ~ Vi 

z *+i ~ Z z 


= h( 

\9(yi, z l ) 


(5.2) 


Possible extensions to non-autonomous problems have been presented in Sect. IV.9. 
For problems Mu' = tp(u) we use the formulation (IV.9.34) also for singular M. 
Due to the invariance of the method with respect to the transformation (1.23), all 
results of this section are equally valid for Mu f =cp(u) of index 1. 

The performance of extrapolation methods relies heavily on the existence of an 
asymptotic expansion of the global error. Such expansions are well understood, if 
the differential equation is nonstiff (see Sections II.8 and IV.9). But what happens 
if the problem is stiff or differential-algebraic? 


Continued study of special problems is still a commendable way 
towards greater insight ... (E. Hopf 1950) 
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Example 5.1. Consider the test problem 

y' = 1, ez' = -z + g{y). (5.3) 

Method (5.1) yields the exact result y i ~ x t = x 0 + ih for the y -component, and 
the recursion 

(e + h)z i+ 1 = ez i + hg( Xl ) + h 2 g'(x 0 ) (5.4) 

for the 2 -component. In order to compute the coefficients of the asymptotic expan¬ 
sion (Theorem II.8.1), we insert 

z i = z ( x t) + hb i( x i) + h 2 b 2 {x t ) + h 3 b 3 (x { ) + ... (5.5) 

into (5.4), expand into a Taylor series and compare the coefficients of hi . This 
yields the differential equation 

eb[(x) + b^x) = z , '{x)-z'(x)+g , (x 0 ) 

for b 1 (x), and similar ones for b 2 (x), b 3 (x), etc. Putting i = 0 in (5.5) we get 
the initial values b t (x 0 ) = 0 (all i). In general, the computation of the functions 
b 1 (x ), b 2 (x ),... is rather tedious. We therefore continue this example for the 
special case x 0 = 0, g{x) = x 2 + 2ex , and z 0 = 0, so that the exact solution of 
(5.3) is z(x) = x 2 . In this situation we get 


6 1 (x)=— 3ee x / £ + 3e — 2x 



etc. We observe that for e -> 0 , the function b 2 (x) becomes discontinuous at x — 0 , 
and b 3 (x) is even not uniformly bounded. Hence, the expansion (5.5) is not useful 
for the study of extrapolation, if e is small compared to the step size H . 

The idea is now to omit in (5.6) the terms containing the factor e~ x t e by re¬ 
quiring that the functions b t (x) be smooth uniformly in e and, instead, to add a 
discrete perturbation j3 i to (5.5). For our example, this then becomes 

= x? T h{2>c — 2(z^) T T (5.7a) 

Inserting (5.7a) into (5.4) gives the relation (e + h)/3 i+1 = e(3 t . The value of (3 0 is 
obtained from (5.7a) with i = 0 . We thus get 

/?i = - (l + e) t ^ eh + h ^- (5 - 7b) 

If the numerical solution is extrapolated, the smooth terms in (5.7) are eliminated 
one after the other. It remains to study the effect of extrapolation on the perturbation 
terms /?•. If the differential equation is very stiff (e <C h) , these terms are very 
small and may be neglected over a wide range of h (observe that i > n 1 ). 
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Example 5.2. For the differential-algebraic problem 

2/' = l, 0 = -z+g(y) (5.8) 

with initial values y 0 = x 0 , z 0 = g(x 0 ) the numerical solution, given by (5.2), is 
' g(x 0 ) for i — 0 

.9{ x i-i) + h 9'(x 0 ) for i > 1 . 

Developing its second formula (for i > 1) yields 

h 2 h3 

z i = g( x i) + h {g'( x o) - g'i x i )) + - -Q 9 '"( x i) + °i h4 )- 

If we add the perturbation 

(which is different from zero only for i = 0) we get for all i 

3 

- g( x i) = h 3 ( b j( x i) +Pi) + °( hi ) 

j =i 


(5.9) 


(5.10) 


where 


g'{x Q )-g'{x), b 2 (x) = -g"{x), b z (x) = ~-g"'{x) 


are smooth functions and the perturbations are given by 
A ) = ^5 /?0 


^"(* 0 )) Pi = ^"'(z o)- 


If we add a further algebraic equation to (5.8), e.g., 0 = u — k(z ), and again 
apply Method (5.2), we get three different formulas for one for i = 0, one 
for 2 = 1, and a different one for i > 2. In an expansion of the type (5.10) for 
u i — k(g(x i )) i perturbation terms will be present for i = 0 and for i — 1. 


Perturbed Asymptotic Expansion 

For general differential algebraic problems we have the following result. 

Theorem 5.3 (Deuflhard, Hairer & Zugck 1987). Consider the problem (1.6) with 
consistent initial values (y 0 ,z 0 ), and suppose that (1.7) is satisfied. The global er¬ 
ror of the linearly implicit Euler method (5.2) then has an asymptotic h-expansion 
of the form 

M 


Vi -v( x i ) = J2 h3 ( a j( x i) + a i) + o(h M+1 ) 

3 =1 

M 

z i - Z ( x z) = 52 h3 ( b j (*;) + Pi ) + 0(h M+1 ) 


(5.11) 
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where a-(x), b-(x) are smooth functions and the perturbations satisfy (see Table 
5.1 and 5.2) 


a i ~ 0 ; 

= 0 

■ 4- 

for 

= 0 , /3 

1 > 1 

!=o 

/or z > 0 

(5.12a) 

(5.12b) 

4 = 0 

for 

i>j~ 

4 and 

3 > 4 

(5.12c) 

pi=o 

for 

i>j- 

2 and 

j> 3 . 

(5.12d) 


The error terms in (5.11) are uniformly bounded for x i —ih<H,ifH is suffi¬ 
ciently small. 

Table 5.1. Non-zero a ’s Table 5.2. Non-zero fi ’s 

h h 2 h 3 h 4 h 5 h 6 h 7 

zo 0 ***** * 

z\ 000**** 

Z 2 0000*** 

z% 00000 ** 

z^. 000000* 

z§ 0000000 



Proof. In part (a) we shall recursively construct truncated expansions 

M 

m = y( x i) + J2 h 5 a j( x i) + a i) +fc M+1 «f +1 

3 - 1 
M 

Z i = Z ( X l ) + Y; h:, ( b 3( X J + Pi') 

j= 1 


(5.13) 


such that the defect of y i , z i inserted into the method is small; more precisely, we 
require that 


(i-hf y { o) -hf z (o)\ (y,+i-y, 
V ~ h 9 y {Q) ~ h aM) \ z i+i- z i 


f{y t , z t 

9(y t A 


+ o(h‘ 


For the initial values we require y 0 = y 0 , z 0 — z 0 , or equivalently 


(0) + ckq — 0, 6j(0) + (3q = 0, (5.15) 

and the perturbation terms are assumed to satisfy 

aj —y 0, p\ -» 0 for i 00, (5.16) 

otherwise, these limits could be added to the smooth parts. The result will then 
follow from a stability estimate derived in part (b). 
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a) For the construction of a-{x), b-{x), a \, f3f we insert (5.13) into (5.14), 
and develop 

/(&>*«) = f(y{xi),z{x t )) + f y (x i )(ha 1 {x i ) + ha] + ...) 

+ fz( x i)( hb i( x i) + h Pi + •••) 

+ f yy ( x i) ( ha i i x i) + ha i + • • -) 2 + • • • > 

y l+ i -Vi = y( x i+ i) - y( x z) + h ( a i( x z+i) - a i ( x t ) + ««+1 - ) + • • • 

h 2 

= hy , (x i )+Yy"( x i ) + --- + h2a 'i( x l ) + h{a 1 i+1 -a 1 i ) + ..., 

where f y (x) = f y {y(x), z(x)), etc. Similarly, we develop g(y t , z\) and z 1+1 - z], 
and compare coefficients of h (for j = 0,..., M). Each power of h will lead 
to two conditions — one containing the smooth functions and the other containing 
the perturbation terms. 

First step. Equating the coefficients of h 1 yields the equations (1.6) for the 
smooth part (due to consistency of the method), and a- +1 — a] =0 for i > 0. 
Because of (5.16) we get a j = 0 for all i > 0 (compare (5.12a)). 

Second step. The coefficents of h 2 give 

a 'i( x ) + \y"i x ) - fy (0 )y'( x ) - f z (oy (x) = f y {x)^ (x) + fzix)^ (x) (5.17a) 
-Sy(0)y'( x ) - g z (0)z , (a:) = g y (x)a 1 (a:) + g z ( x)b 1 (a:) (5.17b) 
«?+1 - - /z(0)(^ + i -/?■) = fMPi (5-17c) 

-gM(Pl +1 -Pl)=g z (0)[3l. (5.17d) 

Observe that the coefficients (3f have to be independent of h, so that f z ( 0), 
g z ( 0) cannot be replaced by /^(a^), g z (x { ) in the right-hand sides of (5.17c, d). 
The system (5.17) can be solved as follows. Compute b^x) from (5.17b) and 
insert it into (5.17a). This gives a linear differential equation for a 1 (x). Because 
of (5.15) and al = 0 the initial value is a x (0) = 0. Therefore a x (x) and b x (x) are 
uniquely determined by (5.17a, b). Differentiating g{y{x), z(x)) =0 and putting 
x = 0 implies that the left-hand side of (5.17b) vanishes at x = 0. Consequently, 
we have 6 1 (0) = 0 and by (5.15), also (3$ = 0. Condition (5.17d) then implies 
(3] — 0 (all z), and (5.17c) together with (5.16) give a\ — 0 (all i). 

Third step. As in the second step we get (for j = 2) 

a 'j( x ) = f y ( x ) a j( x ) + f z ( x ) b j( x ) + r ( x ) (5.18a) 

o = g y ( x ) a j( x ) + g z ( x ) b ji x ) +«(*)> (5.18b) 

where r(x), s(x) are known functions depending on derivatives of y(x), z(x), 
and on a e (x ), b £ (x) with £ < j - 1. We further get 

“i+1 -«i =/*(0 )/ ? i+i 

o = gM0!+i- 


(5.18c) 

(5.18d) 
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We compute a 2 (x ), b 2 (x) as in step 2. However, 6 2 (0) / 0 in general, and for the 
first time, we are forced to introduce a perturbation term (3% ^ 0. From (5.18c, d) 
we then get (3f — 0 (for i > 1) and af = 0 (for all i ). 

Fourth step. Comparing the coefficients of /i 4 we just get (5.18a,b) with j = 3 
and (5.18c,d) with the upper index raised by 1. As above we conclude /3f = 0 (for 
i > 1) and af — 0 (for all i ). 

General step. The conditions for the smooth functions are (5.18a,b). For the 
perturbation terms we get 

a lti - af 1 " 1 = / z (0)/?,- +1 + e 3 t (5.19c) 
0 =9 z (0)X i+1 +af, (5.19d) 

where g\ , a\ are linear combinations of expressions which contain as factors 
af +1 , af"" 1 , /?f _1 with £<j. For example, we have gf = f zz {0)(/3f ) 2 and 
af — 9zz{®){fli) 2 • The proof of (5.12) is now by induction on j. By the in¬ 
duction hypothesis we have g\ — 0, a\ — 0 for i > j — 3. Formula (5.19d) hence 
implies /3f_ kl = 0 (for i> j — 3) and (5.19c) together with (5.16) gives a f +1 = 0 
(for i> j — 3). But this is simply the statement (5.12c,d). 


b) We still have to estimate the remainder term, i.e., differences A y i — y i — y i , 
A z t = z { — z i . Subtracting (5.14) from (5.2) and eliminating Ay- +1 , Az- +1 yields 


( A ^+i A [ j 

VA2 t+ J V A ^J 

/J + 0(M O(h) 

+ V Oi 1) Av^ M+1 ))' 

The application of a Lipschitz condition for f(y,z) and g(y,z) then gives 


ii A ?/i+iii 


1 I 


fl + 0(h) 0(h) 

V e 


\\Az % \\ ) + \<D(hM+i) 


(5.20) 


where \g\<l if H is sufficiently small. Applying Lemma 3.9 we deduce || A y i || + 
||A^.|| = 0(ftAf+i). D 


Order Tableau 

We consider (5.2) as our basic method for extrapolation, i.e., we take some step 
number sequence n 1 < n 2 < ... , put h J — H/rij, and define 

Yj i = y hj ( x 0 + H), Z n = z hj (x 0 + H), (5.21) 

the numerical solution of (1.6) after n- steps with step size h- . We then extrapolate 
these values according to (IV.9.26) and obtain Y jk , Z- k . What is the order of the 
approximations thus obtained? 
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Theorem 5.4 (Deuflhard, Hairer & Zugck 1987). If we consider the harmonic 
sequence {1,2 ,3,4,...}, then the extrapolated values y ]k - z jk satisfy 

Y jk -y(x 0 + h) = O(H r ^ +1 ), Z jk -z(x 0 +H) = O(H‘») (5.22) 

where the differential-algebraic orders r- k , s- k are given in Tables 5.3 and 5.5. 




Table 5.3. 


orders rj k . 


Table 5.4. 

orders 

s jk 

1 








2 







1 

2 







2 

2 






1 

2 

3 






2 

2 

3 





1 

2 

3 

4 





2 

2 

3 

4 




1 

2 

3 

4 

4 




2 

2 

3 

4 

4 



1 

2 

3 

4 

4 

5 



2 

2 

3 

4 

5 

4 


1 

2 

3 

4 

4 

5 

5 


2 

2 

3 

4 

5 

5 

4 

1 

2 

3 

4 

4 

5 

6 

5 

2 

2 

3 

4 

5 

6 

5 

1 

2 

3 

4 

4 

5 

6 

6 5 

2 

2 

3 

4 

5 

6 

6 

1 

2 

3 

4 

4 

5 

6 

7 6 5 

2 

2 

3 

1 

4 

5 

^ 1 

6 

1 


Proof. We use the expansion (5.11). It follows from a\ = /3j =0 (for all i > 0) 
and from (5.15) that «i(^ 0 ) = b 1 (x 0 ) — 0. Since a-(x) and b-(x) are smooth 
functions we obtain affx 0 + H) — 0(H ), bffxQ + H) = 0(H ) and the errors of 

, Z- x are seen to be of size 0(H 2 ). This verifies the entries of the first columns 
of Tables 5.3 and 5.4. In the same way we deduce that a 2 (x 0 + H) = O(H). 
However, since (3% ^ 0 in general, we have b 2 (x 0 ) 0 by (5.15) and the term 

b 2 (x 0 + H) is only of size 0(1). One extrapolation of the numerical solution 
eliminates the terms with j = 1 in (5.11). The error is thus of size 0(H 3 ) for 
Y- 2 but only 0(H 2 ) for Z- 2 , verifying the second columns of Tables 5.3 and 5.4. 
If we continue the extrapolation process, the smooth parts of the error expansion 
(5.11) are eliminated one after the other. The perturbation terms, however, are not 
eliminated. 

For the y -component the first non-vanishing perturbation for i > n 1 = 1 is a® . 
Therefore, the diagonal elements of the extrapolation tableau for the y -component 
(Table 5.3) contain an error term of size 0(H 6 ) (observe that a® is multiplied 
by h 6 in (5.11)). The elements Y-j_ 1 of the first subdiagonal depend only on 
Y n = y ni for i > 2. Since n 2 >2, only the perturbations a\ with i > 2 can have 
an influence. We see from (5.12) that the first non-vanishing perturbation for i > 2 
is . This explains the 0(H 7 ) error term in the first subdiagonal of Table 5.3. 

For the z -component, (3\ is the first perturbation term for i > 1. Hence the 
diagonal entries of the extrapolation tableau for the 2 -component contain an error 
of size 0(H A ). All other entries of Tables 5.3 and 5.4 can be verified analogously. 

□ 


If we consider a step number sequence {n •} which is different from the har¬ 
monic sequence, we obtain the corresponding order tableaux as follows: the j th 
diagonal of the new tableau is the rij th diagonal of Table 5.3 and 5.4, respectively. 
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Theorem 5.4 then remains valid with r- k , s- k given by these new tableaux. This 
implies that a larger n 1 , say n 1 = 2 increases, the order of the extrapolated values. 
Numerical computations have shown that the sequence 

{2,3,4,5,6 ,...} (5.23) 


is superior to the harmonic sequence. It is therefore recommended for SEULEX. 

It is interesting to study the influence of the perturbation terms on the extra¬ 
polated values. Suppose that a J ni (or f3 3 ni ) is the leading perturbation term in Y n 
(or Z n ). Because of the recursion (IV.9.26) all Y kk then contain an error term of 
the form C k H^a 3 ni , whereas the Y- k (for j > k) do not depend on a 3 ni . The error 
constants C k are given recursively by 


<7i = 4’ 


c k = - 


c, 


k-1 


and tend to zero exponentially, if k increases. 


(5.24) 


Error Expansion for Singular Perturbation Problem s 


Our aim is to extend the analysis of Example 5.1 to general singular perturbation 
problems 

y' = f{y,z), y(0) = ?/o (525) 

ez'-g(y,z), z(0) - z Q , 0 <e<l, 

where the solution y(x), z(x) is assumed to be sufficiently smooth (i.e., its deriva¬ 
tives up to a certain order are bounded independently of e ). An important observa¬ 
tion in Example 5.1 was the existence of smooth solutions of the (linear) differential 
equations for the coefficients b t (x ). In the general situation we shall be concerned 
with equations of the form 


a' = f y (x)a + f z (x)b + c(x,e) 

(5.26) 

eb = 9 y( x ) a + 9 z { x )b+d{x,e) 

(the coefficients f y {x) = f (y(x), z(x)), etc. depend smoothly on e because the 
solution of (5.25) itself depends on e, even if / and g are e -independent). 


Lemma 5.5. Suppose that the logarithmic norm of g z (x) satisfies 

p(g z (x)) < — 1 for 0 < x < x . (5.27) 

For a given value 

a(0) = ag + + • • • + s N Oq + 0(e N ~^ 1 ) 

there exists a unique (up to 0(e N+1 )) 

6 ( 0 ) = 6 ° + ebl + ... + e N b£ + 0(e N+i ) 

such that the solutions a(x), b(x) of (5.26) and their first N derivatives are 
bounded independently of e . 
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Proof. We insert the finite expansions 

N N 

= xy°^ a: )’ +)=Yi £ib i( x ) 

i= 0 z=0 

with £ -independent coefficients afx) bfx) into (5.26) and compare powers of 
e (see Section VI.2). This leads to the differential-algebraic system (2.4). Con¬ 
sequently, < 2 q determines b %; these two together with aj determine 6J, etc. The 
remainders a(x) — a(x), b(x) — b(x) are then estimated as in the proof of Theo¬ 
rem 2.1. □ 


The next result exhibits the dominant perturbation terms in an asymptotic ex¬ 
pansion of the error of the linearly implicit Euler method, when it is applied to a 
singular perturbation problem. 

Theorem 5.6 (Hairer & Lubich 1988). Assume that the solution of (5.25) is smooth. 
Under the condition 

ll(J-7fl , 2 (0)r 1 || < +— forall 7>1 (5.28) 

1 + 7 

(which is a consequence of (5.27) and Theorem IV. 11.2), the numerical solution of 
(5.1) possesses for e <h a perturbed asymptotic expansion of the form 

y i = y{ :x i) + h a 1 {x i ) + h 2 a 2 (x i ) + 0{h 3 ) (5.29) 

- ef z (O)g; 1 (0) (/ - - £ g z (0))' {hb x (0) + h\(0)) 

z i = z i x i) + hb 1 ( x i ) + h 2 b 2 (x i ) + <D(h 3 ) (5.30) 

- (-f- 72 z(°)) (hb^ + hH^O)) 

where x i = ih < H with H sufficiently small (but independent of £). The smooth 
functions a-(x), b-(x) satisfy 

MO )=0(e 2 ), a 2 (0) = O(s), b 1 (0)=O(e), b 2 (0) = O(l). 

Proof This proof is organized like that of Theorem 5.3. In part (a) we recursively 
construct truncated expansions (for M < 2) 

M 

& = s/fa.) + X+ ( a j( x i) + a i) 

j =i 
M 

z i= z i x i)+Yl hJ ( 6 .+i)+/+ 

3 = 1 


(5.31) 
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such that 


(i-hf y { o) -hfM 
V eI ~ h 9 z { o) 


y t +1 

? .+i 


fiSiA) 

giViA) 


+ 0(h M+2 ). 

(5.32) 


The smooth functions a-{x ), b-(x) clearly depend on £, but are independent of 
h. The perturbation terms a\ , (for i>l), however, will depend smoothly on 
e and on e/h. As in the case £ = 0, we shall require that (5.15) and (5.16) hold. 
The differences y { — y i and z { — z t will then be estimated in part b). 

a) The case M — 0 is obvious. Indeed, the values y t = y{x { ), z { — z(x { ) 
satisfy (5.32) with M — 0. The construction of the coefficients in (5.31) is done in 
two steps. 


First step ( M = 1). We insert (5.31) into (5.32) and compare the smooth 
coefficients of h 2 . This gives 


«i M + \y"( x ) - fy ( Q )y'( x ) - fz (°) z '( x ) = fy( x ) a L M + fz( x ) b 1 ( x ) (5.33a) 
eb 'i I 21 ) + | z "( x ) - 9y (%'(*) - 9z(°) z '( x ) = 9 y ( x ) a i (a;) + g z (z)&i (x) (5.33b) 


By Lemma 5.5 the initial value b 1 ( 0) is uniquely determined by a 1 (0). Differenti¬ 
ation of ez' — g(y , z) with respect to x gives ez"(x) — g (x)y'(x) + g z (x)z'(x ). 
Inserted into (5.33b) this yields the relation 


(°) + 9z(°) b 1 (°) = °i e ) (5.34) 

with known right-hand side. 

As to the perturbation terms, we obtain by collecting everything up to G(h 2 ) 
<4+1 - hf y (0)(a] +1 - a]) - hf z (0){Pl +1 - p\) 

= h fy{ X i) a \ + b f z ( X i)P} 

£ (4+i ~P})-hg y {0)(a} +1 ~ a\) - hg Z (0)(P} +1 - f3]) 

= h 9y( x i) a l +h9zi x i)Pl 

and try to determine the most important parts of this. We firstly replace hf y (x t )a} 
by hf (Q)a • and similarly for three other terms. This is motivated by the fact that 
we search for exponentially decaying a-. Therefore with x t = ih, 

(fyi X t ) - fy( Q )) a } =0{h). 

Then many terms cancel in (5.35). We next observe that /?1 +1 — (3} is multiplied 
by £, but not a- +1 — a]. This suggests that the /?1 +1 are an order of magnitude 
larger than a - +1 . Neglecting therefore a - +1 where it competes with , we are 
led to define 


«i+l = h fz(°)Pi+1 
e(Pl +1 ~P}) = hg z { 0)/?- +1 . 


(5.33c) 

(5.33d) 
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It remains to verify a posteriori, that there exist solutions of (5.33a,b,c,d) which 
produce an error term 0(h 3 ) in (5.32): from (5.33d) we obtain 

Pi = (j- J9 Z {°)) Pi- (5.36a) 

Since we require a • —» 0 for i —> oo, the solution of (5.33c) is given by 

a l = £ /z(%7 1 (0)(/-j5*(0)) Pl- (5.36b) 

For i = 0 this implies the relation 

“5=e/*(0)<?r 1 (0)/8o- (5-37) 

The assumption (5.15) together with (5.34) and (5.37) uniquely determine the coef¬ 
ficients <^(0), 6 1 (0), aj, (3l . We remark that 6 1 (0) = 0(e) and a x (0) = 0(e 2 ). 
Using the fact that a] = 0(e 2 ) and e < h, one easily verifies that the quantities 
(5.31) with M = 1 satisfy (5.32). 

Second step (M = 2). Comparing the smooth coefficients of h 3 in (5.32) gives 
two differential equations for a 2 ( x ), b 2 (x) which are of the form (5.26). It follows 
from Lemma 5.5 that the initial values have to satisfy a relation 

5j,(0)a 2 (0) + <7 2 (0)6 2 (0) = 0(1) (5.38) 

with known right-hand side. As in the first step we require for the perturbations 

= h /*(0)/?i +1 

e(Pl 1 -Pl) = hgMPlv 

and obtain the formulas (5.36) and (5.37) with a ], (3] replaced by a\ , /3 2 . Again 
the values a 2 (0), 6 2 (0), (3$ are uniquely determined by (5.15), (5.38), and 

(5.37). Due to the 0(1) term in (5.38) we only have & 2 (0) = 0(1) and a 2 (0) = 

0{e). 

We still have to verify (5.32) with M = 2. In the left-hand side we have ne¬ 
glected terms of the form hf (0)(ha] -\-h 2 a 2 ). This is justified, because a\ = 
0{e 2 ), a 2 = 0(e) and e < h . The most dangerous term, neglected in the right- 
hand side of (5.32) is 


HfAxi)~fM)(hPl +h 2 pl). (5.40) 

However, — f z ( 0) = 0(ih), and (3} = 0(e 2 _z ), (3? = 0(2 _z ) by (5.28) 

and e < h. This shows that the term (5.40) is also of size 0(/z 4 ), so that (5.32) 
holds with M — 2. 

b) In order to estimate the remainder term, i.e., the differences A y i — y l — y x , 
A z { = z i — z'i we subtract (5.32) from (5.1) and eliminate Ay - +1 and Az- +1 . This 
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gives 


&y i+ i i = 
AZ; 


Ay { 

AZ; 


+ 


‘i+l 

I + 0(h) 0(h) 

0 ( 1 ) 

Due to (5.28) and e <h we have 
J +(| J -5z( 0 )) fl'z(O) 


9(yi,Zi)-g(yi,Zi) ) + \o(h M + 1 ))' 



(5.41) 


We therefore again obtain (5.20) with some \g\ < 1, if H is sufficiently small. We 
then deduce the result as in the proof of Theorem 5.3. □ 



Of course, it is possible to add a third step to the above proof. However, the 
recursions for af, flf are no longer as simple as in (5.33) or (5.39). Moreover, the 
perturbations of (5.29) and (5.30) already describe very well the situation encoun¬ 
tered in practice. We shall illustrate this with the following numerical example (see 
also Hairer & Lubich 1988). 

Consider van der Pol’s equation (2.73) with e = 10~~ 5 and with initial values 
(2.74) on the smooth solution. We take the step number sequence (5.23) and apply 
Method (5.1) n- times with step size h = Hjn ■. The numerical result Y - x , Z- x is 
then extrapolated according to (IV.9.26). In Fig. 5.1 we show in logarithmic scale 
the errors | Z- - — z(H)\ for j = 1,2,...,6 as functions of H . We observe that 
whenever the error is larger than e 2 = 10 -1 °, the curves appear as straight lines 
with slopes 2,2,3,4,5, and 6, respectively. If its slope is q, we have log(error) « 
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q log H + Const , or equivalently error « CH q . This corresponds (with exception 
of the last one) to the orders predicted by the subdiagonal entries of Table 5.4 for 
the case e = 0. 

In order to understand the irregular behaviour of the curves when the error 
becomes smaller than e 2 = 10“ 10 , we study the influence of extrapolation on the 
perturbation terms in (5.30). Since 6j(0) contains a factor e, the dominant part 
of the perturbation in Z- x is (I — {h/e)g z (0))~ n ih 2 b 2 (0) , where & 2 (0) is some 
constant and h — H/n ■ . We assume the matrix ^(0) to be diagonalized and put 
<7^(0) = —1. The dominant perturbation in Z- x is therefore e 2 T- 1 6 2 (0), where 


T = 

1 ji 


(—) 2 (l + —) ^ 

\en-J \ en-J 


(5.42) 


Due to the linearity of extrapolation, the dominant perturbation in Z-- will be 
e 2 T jj b 2 ( 0), where T-- is obtained from (5.42) and (IV.9.26). For the step number 
sequence (5.23) the values of T- are plotted as functions of H/s in Fig. 5.2. For 
large values of H/e the curves appear as horizontal lines. This is a consequence 
of our choice n 1 — 2 and of the fact that 


T a = 


( ny- 


+ o 


(( 


H ' 


, H 
for- y oo, 

£ 


4) 

\ e / We JJ 

where C x — 1 and the other C ■ are given by the recursion (5.24). 

The errors of Fig. 5.1 are now seen to be a superposition of the errors, predicted 
from the case e = 0 (Theorem 5.4), and of the perturbations of Fig. 5.2 scaled by a 
factor 0(e 2 ). 


Remark. As mentioned in Sect. VI. 1, the implicit Euler discretization possesses a 
classical asymptotic expansion for differential-algebraic problems (1.6) of index 1 
(case e = 0). However, for singular perturbation problems, perturbations of the 
same type as in (5.29) and (5.30) are present. The only difference is that all &•(0) 
contain a factor e for the implicit Euler method. For details and numerical experi¬ 
ments we refer to Hairer & Lubich (1988). A related analysis for a slightly different 
class of singular perturbation problems is presented in Auzinger, Frank & Macsek 
(1990). 


Dense Output 

Extrapolation methods typically take very large (basic) step sizes during integra¬ 
tion. This makes it important that the method possess a continuous numerical so¬ 
lution. The first attempt to get a dense output for extrapolation methods is due to 
Lindberg (1972). His approach, however, imposes severe restrictions on the step 
number sequence. We present here the dense output of Hairer & Ostermann (1990), 
which exists for any step number sequence. 

The main idea (due to Ch. Lubich) is the following: when computing the j - 
th entry of the extrapolation tableau, we consider not only = y n ., but also 
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compute the difference (y n . — y nj -i)/h- . Since these expressions possess an h- 
expansion, their extrapolation gives an accurate approximation to y f (x 0 + H ). By 
considering higher differences, we get also approximations to higher derivatives 
of y(x) at x 0 + H . They are then used for Hermite interpolation. The reason for 
computing the derivatives only at the right end of the basic interval, is the presence 
of perturbation terms as described in Theorems 5.3 and 5.6. These perturbations 
are large at the beginning (near the initial value), but decrease exponentially for in¬ 
creasing i . For the same reason, one must not use differences of a too high an order. 
We thus choose an integer A (usually 0 or 1) and avoid the values y 0 ,..., y ni+A _ 2 
for the computation of the finite differences. We remark that a similar idea was used 
by Deuflhard & Nowak (1987) to construct consistent initial values for differential- 
algebraic problems. 

An algorithmic description of the dense output for the linearly implicit Euler 
method is as follows (we suppose that the value Y KK has been accepted as a nu¬ 
merical approximation to y(x 0 + if)). 

Step 1. For each j £{!,...,«} we compute 


„(*) 


V k y\ 


U) 


h) 


for k = —A. 


(5.43) 


Here y\^ is the approximation of y{x i ), obtained during the computation of Y- x , 
and Vy i = y i — y i _ 1 is the backward difference operator. 

Step 2. We extrapolate , (k — k — A) times. This yields the improved approxi¬ 
mation r( fc ) to y( k \x 0 +H). 

Step 3. We define the polynomial P(0) of degree n by 

P{0)=y o , P{1) = Y kk 

✓ jv , (5-44) 

P (fc) (l) =H k r {k) forfc = l,...,«-l. 

The following theorem shows to which order these polynomials approximate the 
exact solution. 


Theorem 5.7 (Hairer & Ostermann 1990). Consider a nonstiff differential equation 
and let A £ {0,1}. Then , the error of the interpolation polynomial P(0) satisfies 

P(0) — y(x 0 + OH) = 0(H K+1 ~ X ) for H^O. 


Proof Since P(6) is a polynomial of degree k, the error due to interpolation is of 
size (9(iJ K+1 ). We know that Y KK — y(x 0 + H) = 0(i7 K+1 ). Therefore it suffices 
to prove that 

r (fc) =y {k \x 0 +H) + O{H K - k - x+1 ) for fc = 1 ,... ,/s - 1. (5.45) 

Due to the asymptotic expansion of the global error y i — !/(.>' t ), the approximations 
also have an expansion of the form 

r p = y( k) (x 0 + H) + hjdP + rfaP +.... 


(5.46) 
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The statement (5.45) now follows from the fact that each extrapolation eliminates 
one power of h in (5.46). □ 


It is now natural to investigate the error of the dense output P(0) also for stiff 
differential equations, such as singular perturbation problems. We shall treat here 
the limit case £ = 0 which is easier to analyse and, nevertheless, gives much insight 
into the structure of the error for very stiff problems. 

For the differential-algebraic system (1.6) one defines the dense output in ex¬ 
actly the same way as for ordinary differential equations. As the system (1.6) is 
partitioned into y - and 2 -components, it is convenient to denote the corresponding 
interpolation polynomials by P(6) and Q(6), respectively. 

Theorem 5.8 (Hairer & Ostermann 1990). Let y(x), z(x ) be the solution of (1.6). 
Suppose that the step number sequence satisfies n 1 + A > 2 with A G {0,1}. We 
then have 

P{6) - y(x 0 + OH) = 0(H* +1 ~ X ) + 

Q{0)-z{x o +0H) = O{H K+1 ~ x ) + O{H s ), 

where r and s are the (« + n 1 + A — 2, «) -entries of Table 5.3 and Table 5.4, 
respectively. 

Proof. We use the perturbed asymptotic error expansions of Theorem 5.3. Their 
smooth terms are treated exactly as in the proof of Theorem 5.7 and yield the 
0(H k+1 ~ x ) error term in (5.47). The second error terms in (5.47) are due to 
the perturbations in (5.11). We observe that the computation of r) ' involves only 
y t (or Zi ) with * > -j + \. Since n j -j>n 1 - 1, the values y 0 ,...,y ni +A _ 2 

do not enter into the formulas for rA , so that the dominant perturbation comes 
from y ni+A _j (or □ 


It is interesting to note that for A = 1, the second errror term in (5.47) is of the 
same size as that in the numerical solution Y KK , Z KK (see Theorem 5.4). However, 
one power of H is lost in the first term of (5.47). On the other hand, one H may be 
lost in the second error term, if A = 0. Both choices lead to a cheap (no additional 
function evaluations) and accurate dense output. Its order for 0 £ (0,1) is at most 
one lower than the order obtained for 6 = 1. 
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Exercises 


1. The linearly implicit mid-point rule, applied to the differential-algebraic sys¬ 
tem (1.6), reads 


(l-hf y { o) -hfM\(Vi+i-Vi\ 

\ -hg y \o) ~hg z (0)j\z i+1 -zj 

= _ ( 1 + h fy {°) h fz( °) \ { Vi~ Vi-1 

V hg x (0) hg z (0)J \Zi-Zi_-L 


(5.48) 


+ 2h 


9{Vi,Zi) ) ' 


If we compute y x ,z x from (5.2), and if we define the numerical solution at 
x 0 + H (H = 2 mh) by 

Vh( X 0+ H ) = |(y 2 m+l +2/2m-l)> z k( X 0+ H ) = H Z 2m+l + Z 2m-1 )> 

this algorithm constitutes an extension of (IV.9.16) to differential-algebraic 
problems. 

a) Show that this method integrates the problem (5.8) exactly. 

b) Apply the algorithm to 

y = 1, 0 = u — y 2 , 0 = v — yu, 0 = w — yv, 0 = z — yw 


with zero initial values and verify the formula 

I i z 2m+l + z 2m-l) - z ( x 2m) = 

- (- 1 )™ (I X 2m + X 2m h2 + Qx 2m h4 )' 


Remark. The error of the z-component thus contains an h -independent term 
of size 0(H 5 ), which is not affected by extrapolation. 


2. Consider the method of Exercise 1 as the basis of an h 2 -extrapolation me¬ 
thod. Prove that for the step number sequence (IV.9.22) the extrapolated values 
satisfy 

Y jk - y(x 0 +H) = 0(H r ^ +1 ), Z jk - z(x 0 + H) = 0(H S *) 
with r- k , Sj k given in Tables 5.5 and 5.6. 

Hint. Interpret , Z- x as numerical solution of a Rosenbrock method (Ex¬ 
ercise 3 of Sect. IV.9) and verify the order condition derived in Sect. VI.3 (see 
also Hairer & Lubich (1988b) and C. Schneider (1993)). 


Table 5.5. orders rj k . 

1 

1 3 

1 3 5 

13 5 7 
1 3 5 7 7 

1 3 5 7 7 7 

1 3 5 7 7 7 7 


Table 5.6. orders Sj k . 

2 

2 4 

2 4 5 

2 4 5 5 

2 4 5 5 5 

2 4 5 5 5 5 

2 4 5 5 5 5 5 



VI.6 Quasilinear Problems 


Quasilinear differential equations are usually understood to be equations in which 
the highest derivative appears linearly. In the case of first order ODE systems, they 
are of the form 

C(y)-y' = f{y), (6.1) 

where C(y) is a n x n -matrix. In the regions where C(y) is invertible, Eq. (6.1) 
can be written as 

y'= C(y)~ 1 ■ f(y) (6.1’) 

and every ODE-code can be applied by solving at every function call a linear sys¬ 
tem. But this would destroy, for example, a banded structure of the Jacobian and 
it is therefore often preferable to treat Eq. (6.1) directly. If the matrix C is every¬ 
where of rank m (m < n), Eq. (6.1) represents a quasilinear differential-algebraic 
system. 


Example: Moving Finite Elements 

As an example, we present the classical idea of “mov¬ 
ing finite elements”, described in K. Miller & R.N. 
Miller(1981): the solution u(x,t ) of a nonlinear par¬ 
tial differential equation 

g u 

— = L(u(x, t)) , u(0,i) = u(l,f) =0, 

( 6 . 2 ) 

where L(u) is an unbounded nonlinear differential 
operator, is approximated by finite element polygons 
v{x, Cj, s 1 ,..., a n , s n ) which satisfy ,...) = aj 
(see Fig. 6.1). These polygons form a 2 n -dimensional 
manifold in the Hilbert space L 2 ( 0,1) parametrized 
by a 1 , ,..., a n , s n . The idea is now to move si¬ 

multaneously a(t) and s(t) in oder to adapt at any 
time the finite element solution as best as possible to 
Eq. (6.2). 



Fig. 6.1. Moving finite elements 
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We thus require that the defect v — L(y) remains always orthogonal to the tangent 
space. The conditions 


lead to a system of type (6.1) with 

_f 1 dv_ . _dv_ 7 

c 2j-l,2k-l ~ Jo daj da k aX i 

_ r 1 dv_ ' _dv_ j 

C 2j,2k-\ ~ Jo dsj da k 

fij -1 = So L ( v ) ■ ^7 dx i 


l 


(v-L(v)) 'T^- dx 


fa**. 


— r 1 

c 2j-l,2k — Jo daj 

_ C 1 dv_ dv_ j 
c 2j,2k ~ Jo dsj ds k aX i 

fij = fo L ( v ) ■ iij dx - 


(6.3) 


(6.4) 


For the partial derivatives of v, sketched in Fig. 6.1, the non-zero of these scalar 
products become 

C 2j-l,2j-l = H A j + A j+l) C 2j-l,2j = 4K' A j+ m i+l A J+ l) 

c 2 j, 2 j-i = - IK' A i + m j+i A j+i) c 2j<2j = |K A i + m m A j+i) + 2e2 

(6.5a) 

C 2j-l,2j+l = C 2j+l,2j-l = 6 A i+1 C 2j-l,2j+2 = C 2j+2,2i-l = ~\ m j+ 1 A i+1 

C 2i,2i+1 = C 2 jH-1,2j = C 2j,2j+2 = C 2 < 7 + 2,2 <7 ' = ^ 

(6.5b) 


where 


A i = 5 i ” s i-i ’ m j = ( a j - a j-i )/ A K j = l,...,n + l. 

The matrix C(y) is banded with bandwidth 3 +1 + 3. The e 2 -terms in (6.5) come 
from an “internodal viscosity” penalty term, explained in Miller & Miller (1981), 
which aims to regularize the relative movement of the nodes s- whenever their 
position is ill-conditioned, which happens to appear in the vicinity of inflection 
points (see Fig. 6.2). 

It is then hoped that the nodes move automatically into the critical regions of 
the solutions, move with shocks which may appear, and that a(t) and s(t) become 
smooth functions. 


Application to Burgers’ Equation. Burgers’ Equation is given by 

= + or u t = -(Y) x + t J ‘ u *x ( 6 - 6 ) 

where fx = 1/R and R is called te Reynolds number. This is one of the equa¬ 
tions originally designed by Burgers (1948) as “a mathematical model illustrating 
the theory of turbulence”. However, soon afterwards, E. Hopf (1950) presented 
an analytical solution (see Exercise 1 below) and concluded that “we doubt that 
Burgers’ equation fully illustrates the statistics of free turbulence. (...) Equation 
(1) is too simple a model to display chance fluctuations ...”. Nowadays it re¬ 
mains interesting as a nonlinear equation resembling the Navier-Stokes’ equations 
in fluid dynamics which possesses, for R large, shock waves and, for R K oo, 
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discontinuous solutions. Here, the integrals in (6.4) become 

fij-i = A j + f 2 j = c j + i lD v j = l,...n. (6.5c) 

where 

A i =(s-i - + K-i)+K- - a i+i)(l a i+i a i+i)> 

c j = 1 - + hj-i) - m j+i( a j - a j+i)(hj + 1 )» 

D j = K+i - m i)(l m i+i + 

(6.5d) 

(in the case of D • appears the product of a Dirac 5 function with a discontinuous 
function; these must be suitably “mollified”). We choose as initial function 

u(x , 0) = (sin(37rx)) 2 • (1—x) 3//2 , [i = 0.0003 (6.7) 

and as initial positions 

Sj=j/{n + l), a j =u(s j ,0), j = n = 100 , 

and solve the problem with smoothing parameter £ = 10 -2 for 0 < t < 1.9. Two 
shock waves arise which later fuse into one (see Fig. 6.2). 


Fig. 6.2. Moving finite element solution of Burgers’ equation 
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Problems of Index One 


For invertible C(y), Eq. (6.1) is an ordinary differential equation and standard 
theory (for existence and uniqueness results) can be applied. If the matrix is ill- 
conditioned or even singular, new investigations are necessary. In order to exclude 
equations with singularities, such as xy' = (q + bx)y (see Sect. 1.5), we assume 
that 

C(y) has constant rank m (m < n) (6.8) 


in a neighbourhood of the solution. Then the columns of C(y) span an m -dimen¬ 
sional subspace XmC(y) which moves with y. Clearly, in order that (6.1) can 
make sense, we need consistent initial values, i.e., we need 


f(y 0 ) elmC(y 0 ). 


(6.9) 


We shall now show, how, under a certain condition, this property can be satisfied 
for all x and determines uniquely the solution: choose a nonsingular matrix 

T(j/)= (^(yj) such that T{y)C{y)=( Bl Wy, (6.10) 

this means that the rows of T 2 (y) must span the (n — m) -dimensional orthogonal 
complement of Xm C(y). Then we multiply Eq. (6.1) by T(y) and obtain 



T 2 (y)f(v)J’ 


( 6 . 11 ) 


so that the condition corresponding to (6.9) becomes visible in the form T 2 (y)f(y) — 
0. Differentiating this relation and inserting the derivative into the second part of 
(6.11), we obtain 


( B^y) 

\(T 2 fY(y) 


o j’ 


( 6 . 12 ) 


which is a regular quasilinear equation if the matrix 


( B^y) 

\(T 2 f)'(y) 


is invertible. 


(6.13) 


Lemma 6.1. Let the matrix C(y) satisfy (6.8) and (6.13), and let the initial val¬ 
ues y 0 fulfill (6.9). Then, the quasilinear problem C(y)y' = f(y), y(x 0 ) = y 0 
possesses a locally unique solution. 

Proof. Condition (6.9) means that T 2 (y 0 )f(y Q ) = 0 and the second part of (6.12) 
assures that (T 2 (y(x))f(y(x))y = 0. Therefore we have (T 2 f)(y(x)) = 0 for all 
x , and the solution of (6.12) solves also (6.11) and (6.1). □ 
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The following result gives a consequence of condition (6.13) which shall be 
essential in the later discussions of feasibility of numerical procedures. 

Lemma 6.2. Assume that C(y) satisfies (6.8) and (6.13). If f(y 0 ) = C(y 0 )y f 0 , 
then the matrix 

C(y) + X(f(y Q )-T(y 0 ,y' 0 )) 

is invertible for sufficiently small A ^ 0 and for y sufficiently close to y 0 . Here, 

T{y,y') = A{C{y )y '). 

Proof. Condition (6.13) implies that 

T {y)C{y) +\{Tf)'(y 0 ) is invertible (6.14) 

for small A 0 and y close to y 0 . Using T'C + TC 1 = B f we have 

(T'f)(y 0 ) =T(y 0 )C(y 0 )y' 0 = -T(y 0 )T(y 0 , y' 0 ) + B'{y 0 )y' 0 . (6.15) 

Since B'(y 0 )y f 0 does not contribute to the lower block of the matrix (6.14), it can 
be neglected after insertion of (T/)' = Tf +T'f and (6.15) into (6.14). This 
implies that 

T{y)C{y ) + A T(y 0 )(f(y 0 ) - T(y 0 ,y' 0 )) is invertible. 

The statement of the Lemma now follows from a continuity argument. □ 


Numerical Treatment of C(y)y 9 = f(y) 

As has been said above, in the case of invertible matrices C(y), one can eventually 
apply an explicit numerical method to (6.T). However, if Eq. (6.T) is stiff, implicit 
methods have to be applied. In this case it may be advantageous to have methods 
that avoid the computation of the Jacobian of C(y)~ 1 f(y). 

Transformation to Semi-Explicit Form. In the case where (6.T) is stiff or where 
C(y) is singular and satisfies (6.8) and (6.13) we introduce z — y' as new variable, 
such that system (6.1) becomes of the semi-explicit form 

y — z 

(6.16) 

0 = C(y)z — f(y) 

Here, all methods of the preceding sections can be applied (at least formally). 
The study of convergence, however, needs further investigation, because Condi¬ 
tion (1.7) is no longer satisfied here. 

Implicit Runge-Kutta and Multistep Methods. With the e -embedding approach 
(see (1.11) for Runge-Kutta methods and (2.2) for multistep methods) we are led to 
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nonlinear equations, which, when solved by Newton iterations, require the solution 
of linear systems of the form 

( r (%>*o) -f(yo) c (Vo)) ' (6 ' 17) 

Here a = ( 7 h) -1 , and 7 is an eigenvalue of the Runge-Kutta matrix. By Lem¬ 
ma 6.2 this matrix is invertible for small enough h > 0. Convergence follows from 
the results of Sections VII.3 and VII.4 (see Exercise 2). 

Rosenbrock Methods. Method (4.3) applied to system (6.16) leads to 



(6.18b) 


Again, it can be seen that (6.18b) represents a linear system whose regularity is as¬ 
sured by Lemma 6.2. However, since Condition (1.7) is not satisfied, a new theory 
for the order conditions of the local error as well as for convergence of the global 
error is necessary. This theory reveals, for example, that new order conditions for 
the coefficients are necessary and explains why, say, the code RODAS, directly ap¬ 
plied to (6.16), does not give precise results. For full details we refer the reader to 
the original publication Lubich & Roche (1990). 

Extrapolation Methods 

The first problem is to find suitable linearly implicit Euler discretizations for (6.1), 
to serve as basic method for the extrapolation algorithm (see Sect. IV.9). 

Method of Deuflhard & Nowak. Applying the linearly implicit Euler method 
(IV.9.15) to the differential equation (6.L) we obtain 

(I - hA)(y i+1 - Vi ) = hC{y i )~ x f{y i ) (6.19) 

where 

A « (C-Wtoo) = CiVo )- 1 (/'(%) - r(y 0 , y' 0 )) 
with T(y, y f ) as in Lemma 6.2. Multiplication of (6.19) with C(y i ) yields 

(c{y t ) - h c(y l )C{y 0 )~ 1 J){y i+1 -vd = h f(vi) 

with J = f'ivo) — r(y 0 , y f 0 ) . Deuflhard & Nowak (1987) suggest to replace 
C(yi)C(y 0 )~ 1 by the identity matrix, which “may be interpreted as just introduc¬ 
ing an approximation error into the Jacobian matrix”. This leads to the discretiza¬ 
tion 

(C( yi ) - hJ){y i+1 - Vi ) = hf(y { ) 


(6.20) 
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which represents the basic step for the code LIMEX described in Deuflhard & 
Nowak (1987). The regularity of the matrix of this linear system is again assured 
by Lemma 6.2. 

The computation of J requires an approximation to z 0 = y' 0 . Such consistent 
initial values must be computed explicitely for the first basic steps, and are obtained 
by extrapolation of 

z n = (y n -yn-l)/ h (6.21) 


in the subsequent steps. 


Linearly-Implicit Euler to Semi-Explicit Model Another possibility is to apply 
the linearly-implicit Euler discretization (5.2) to the differential-algebraic system 
(6.16). This gives 


f I -hi 
\-hJ hC(y 0 ) 


Vi+l - Vi 

z i+l - z i 


= h 




( 6 . 22 ) 


with z Q —yQ — y'(x 0 ). The first line yields z i+1 = (y i+1 —y^/h and the second 
line becomes 


(C(y 0 ) - hj)(y i+1 - Vi ) = hf{ Vi ) - (C( yi ) - C(y 0 ))( yi - y^). (6.23) 


The right-most term vanishes for i = 0, so that y_ 1 does not enter the algorithm. 


Asymptotic Expansions. The theoretical justification of the use of either (6.20) or 
(6.23) as basic step for an extrapolation process requires the investigation of the 
asymptotic expansion of their global errors. 

In the situation where C(y) is invertible, the discretization (6.20) is a consis¬ 
tent one-step discretization of (6.T) and possesses therefore, by standard theory 
(Theorem II.8.1), an asymptotic expansion, the terms of which, however, depend 
on the stiffness. Since the system (6.16) is of the form (1.6) with assumption (1.7) 
satisfied, we can conclude from Theorem 5.3 the existence of a perturbed asymp¬ 
totic expansion for the second discretization (6.23). 

In the situation where C(y) is singular, Lubich (1989) reveiled the existence 
of a perturbed asymptotic expansion for both discretizations (6.20) and (6.23). We 
refer to this original publication for further details, in particular to the study of the 
influence of these perturbations to the extrapolated numerical approximation. 


Exercises 

1. Reconstruct E. Hopf’s analytic solution of Burgers’ equation (6.6). 
Hint. Introduce the new dependent variable 
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Show that for a suitably chosen c(t) the function satisfies the one di¬ 

mensional heat equation. The solution u(x,t) of (6.6) can then be recovered 
from ip(x, t) by 

u = -2^(log tp) x = -2fx((p x /ip) . 


2. Assume that (6.8) and (6.13) hold. By eliminating from 0 = C(y)z — f(y) 
as many components of 0 as possible, transform the system (6.16) into an 
equivalent one of the form 

y' = F(y,u), 0 = G(y), 

where u collects the remaining components of 2 . 

a) Prove that Runge-Kutta methods and multistep methods are invariant with 
respect to this transformation. 

b) Show that G y (y)F u (y , u) is invertible, so that the convergence results of 
Sections VII.3 and VII.4 can be applied. 

3. (Quasilinear problems with gradient-type mass-matrix, see HLR89, page 111). 
Consider the electrical circuit (1.14), but suppose this time that the capacities 
depend on the voltages, e.g., as 

C k = C k0 /(l-{U i -U j )/U b ) 1 f 2 

so that the expressions C k {JJ[ — U'-) in (1.14) must be replaced by (C k (U i — 
Uj)) f . Show that then the corresponding equations are of the form (6.1) with 

C{y) = Aq'{y) 

where A is a constant matrix and q(y) a known function of y . Show that such 
problems can be efficiently solved by introducing q(y) = z as a new variable 
such that the problem becomes semi-explicit as 

Az' = f{y), 0 = z-q(y). 



Chapter VII. Differential-Algebraic Equations 
of Higher Index 



In the preceding chapter we considered the simplest special case of differential- 
algebraic equations - the so-called index 1 problem. Many problems of practical 
interest are, however, of higher index, which makes them more and more difficult 
for their numerical treatment. 

We start by classifying differential-algebraic equations (DAE) by the index 
(index of nilpotency for linear problems with constant coefficients; differentiation 
and perturbation index for general nonlinear problems) and present some exam¬ 
ples arising in applications (Sect. VII. 1). Several different approaches for solving 
numerically higher index problems are discussed in Sect. VII.2: index reduction 
by differentiation combined with suitable projections, state space form methods, 
and treatment as overdetermined or unstructured systems. Sections VII.3 and VII.4 
study the convergence properties of multistep methods and Runge-Kutta methods 
when they are applied directly to index 2 systems. It may happen that the order of 
convergence is lower than for ordinary differential equations (“order reduction”). 
The study of conditions which guarantee a certain order is the subject of Sect. VII.5. 
Half-explicit methods for index 2 problems are especially suited for constrained 
mechanical systems (Sect. VII.6). A multibody mechanism and its numerical treat¬ 
ment are detailed in Sect. VII.7. Finally, we discuss symplectic methods for con¬ 
strained Hamiltonian systems (Sect. VII.8), and explain their long-term behaviour 
by a backward error analysis for differential equations on manifolds. 




VII.l The Index and Various Examples 


The most general form of a differential-algebraic system is that of an implicit dif¬ 
ferential equation 

F(u',u)= 0 (1.1) 

where F and u have the same dimension. We always assume F to be sufficiently 
differentiable. A non-autonomous system is brought to the form (1.1) by appending 
x to the vector u , and by adding the equation x' = 1. 

If dFjdu f is invertible we can formally solve (1.1) for u f to obtain an ordinary 
differential equation. In this chapter we are interested in problems (1.1) where 
dF/du' is singular. 


Linear Equations with Constant Coefficients 

Uebrigens kann ich die Meinung des Hm. Jordan nicht theilen, 
dass es ziemlich schwer sei, der Weierstrass -schen Analyse zu 
folgen; sie scheint mir im Gegentheil vollkommen durchsichtig 
zu sein, ... (L. Kronecker 1874) 

The simplest and best understood problems of the form (1.1) are linear differential 
equations with constant coefficients 

Bu + Au = d{x). (1.2) 

In looking for solutions of the form e Xx u 0 (if d(x) = 0) we are led to consider 
the “matrix pencil” A + A B . When A + A B is singular for all values of A, then 
(1.2) has either no solution or infinitely many solutions for a given initial value 
(Exercise 1). We shall therefore deal only with regular matrix pencils , i.e., with 
problems where the polynomial det (A + XB) does not vanish identically. The key 
to the solution of (1.2) is the following simultaneous transformation of A and B 
to canonical form. 

Theorem 1.1 (Weierstrass 1868, Kronecker 1890). Let A + A B be a regular matrix 
pencil. Then there exist nonsingular matrices P and Q such that 


PAQ = 


C 0\ 

0 I r 


PBQ = 


I 0 
0 N 


(1.3) 
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where N = blockdiag (IVj,..., N k ), each N i is of the form 



(0 1 

°) 



*i = 

Vo 

0 1 
0 ) 

, of dimension m i , 

(1.4) 


and C can he assumed to he in Jordan canonical form. 


Proof (Gantmacher 1954 (Chapter XII), see also Exercises 2 and 3). We fix some 
c such that A+ cB is invertible. If we multiply 

A + A B = A + cB + (A — c)B 

by the inverse of A + cB and then transform (A + cB)~ 1 B to Jordan canonical 
form (Theorem 1.12.2) we obtain 



+ (A-c) 


J x 0 
0 j 2 


(1.5) 


Here, J 1 contains the Jordan blocks with non-zero eigenvalues, J 2 those with zero 
eigenvalues (the dimension of is just the degree of the polynomial det(A + 
A B)). Consequently, J x and I — cJ 2 are both invertible and multiplying (1.5) 
from the left by blockdiag ( Jf 1 , (I — cJ 2 )~ 1 ) gives 


( j >“V Ji) ?) +a 


1 0 ) 

0 {I-cJ 2 )-'J 2 ) • 


The matrices — cJfj and (I — cJ 2 )~ l J 2 can then be brought to Jordan 

canonical form. Since all eigenvalues of (J — cJ 2 )~ 1 J 2 are zero, we obtain the 
desired decomposition (1.3). □ 


Theorem 1.1 allows us to solve (1.2) as follows: we premultiply (1.2) by P 
and use the transformation 

■=«(:)■ ««=(&))• 

This decouples the differential-algebraic system (1.2) into 

y' + Cy = rj(x), Nz f + z = 8(x). (1.6) 

The equation for y is just an ordinary differential equation. The relation for 2 
decouples again into k subsystems, each of the form (with m = m i ) 

z' 2 + z 1 =^l(*) 


z 'm+ z m -1 


2 


m 


= S m-l( x ) 

= U*)- 


(1.7) 
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Here z m is determined by the last equation, and the other components are obtained 
recursively by repeated differentiation. Thus z 1 depends on the (m — 1) -th deriva¬ 
tive of S m (x) . Since numerical differentiation is an unstable procedure, the largest 
m i appearing in (1.4) is a measure of numerical difficulty for solving problem 
(1.2). This integer (maxmj is called the index of nilpotency of the matrix pencil 
A + XB . It does not depend on the particular transformation used to get (1.3) (see 
Exercise 4). 

Linear Equations with Variable Coefficients. In the case, where the matrices A 
and B in (1.2) depend on x , the study of the solutions is much more complicated. 
Multiplying the equation by P(x) and substituting u = Q(x)v , yields the system 

PBQv + (PAQ + PBQ')v = 0, (1.8) 

which shows that the transformation (1.3) is no longer relevant. With the use of 
transformations of the form (1.8), Kunkel & Mehrmann (1995) derive a canonical 
form for linear systems with variable coefficients. 


Differentiation Index 

A lot of English cars have steering wheels. 

(Fawlty Towers , Cleese and Booth 1979) 


Let us start with the following example: 

y'l =0.7-y 2 +sin(2.5-z) = f 1 (y,z) 

y' 2 = l -4-y 1 +cos(2.5-z)=f 2 (y,z) ' ’ 

0 = y\ +vl -1 = g(y)- (i-9b) 

The “control variable” z in (1.9a) can be interpreted as the position of a “steering 
wheel” keeping the vector field (y [, y f 2 ) tangent to the circle y\ + y 2 = 1, so that 
condition (1.9b) remains continually satisfied (see Fig. 1.1a). By differentiating 
(1.9b) and substituting (1.9a) we therefore must have 

g y {y)f{y, z) = o. (i.9c) 

This defines a “hidden” submanifold of the cylinder, on which all solutions of 
(1.9a,b) must lie (see Fig. 1.1b). We still do not know how, with increasing x, 
the variable z changes. This is obtained by differentiating (1.9c) with respect to x : 
9yy(f, f) + 9yfyf + 9 yf >' = 0. From this relation we can extract 

* = -(fl , y /z) _1 (9yy{f, f)+9yfyf) (1-9(1). 


if 


9 y {y)fAy^ z ) is invertible. 


(1.10) 
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Fig. 1.1a. The vector field (1.9a,d) 



Fig. 1.1b. The hidden submanifold 


We have been able to transform the above differential-algebraic equation 
(1.9a,b) into an ordinary differential system (1.9a,d) by two analytic differentia¬ 
tions of the constraint (1.9c). This fact is used for the following definition, which 
has been developed in several papers (Gear & Petzold 1983, 1984; Gear, Gupta & 
Leimkuhler 1985, Gear 1990, Campbell & Gear 1995). 


Definition 1.2. Equation (1.1) has differentiation index di = m if m is the minimal 
number of analytical differentiations 


F(u f , u) = 0, 


dF(uu) 
dx 


d m F(u f ,u) _ Q 
dx™ “ 


( 1 . 11 ) 


such that equations (1.11) allow us to extract by algebraic manipulations an explicit 
ordinary differential system u f = <p(u) (which is called the “underlying ODE'). 


Examples. Linear Equations with Constant Coefficients. The following problem 


kT 

II 

+ 

— 

** 

4 + z[ = K 


z 3 + z 2 ~ ^2 

=> 4" + 4' = £'' 

=$> z[ = 8[ — 6%S'f (1.12) 

z 3 = S 3 

4 = 4’ 


be seen to have differentiation index 3. 

For linear equations with constant 


coefficients the differentiation index and the index of nilpotency are therefore the 
same. 


Systems of Index 1. The differential-algebraic systems already seen in Chapter VI 

y' = f(y,z) d.i3a) 

o =g{y,z) (1.13b) 
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have no z 1 . We therefore differentiate (1.13b) to obtain 

z = -g7 1 (y^ z )9 y (y,z)f(y,z) (i.i3c) 

which is possible if g z is invertible in a neighbourhood of the solution. The prob¬ 
lem (1.13a,b), for invertible g z , is thus of differentiation index 1. 

Systems of Index 2. In the system (see example (1.9)) 

y' = f(y,z) (1.14a) 

0 = g(y), (1.14b) 

where the variable z is absent in the algebraic constraint, we obtain by differentia¬ 

tion of (1.14b) the “hidden constraint” 

Q = g y (y)f{y,z). (i.i4c) 

If (1.10) is satisfied in a neighbourhood of the solution, then (1.14a) and (1.14c) 
constitute an index 1 problem. Differentiation of (1.14c) yields the missing differ¬ 
ential equation for z, so that the problem (1.14a,b) is of differentiation index 2. If 
the initial values satisfy 0 = g(y 0 ) and 0 = g y (yo)f(y 0 , z 0 ), we call them consis¬ 
tent. In this case, and only in this case, the system (1.14a,b) possesses a (locally) 
unique solution. 

System (1.14a,b) is a representative of the larger class of problems of type 
(1.13a,b) with singular g z . If we assume that g z has constant rank in a neighbour¬ 
hood of the solution, we can eliminate certain algebraic variables from 0 = g(y,z) 
until the system is of the form (1.14). This can be done as follows: from the con¬ 
stant rank assumption it follows that either there exists a component of g such that 
dg i /dz 1 ^ 0 locally, or dg/dz 1 vanishes identically so that g is already indepen¬ 
dent of z x . In the first case we can express z x as a function of y and the remaining 
components of 0 , and then we can eliminate z x from the system. Repeating this 
procedure with z 2 , z s , etc., will lead to a system of the form (1.14). This transfor¬ 
mation does not change the index. Moreover, most numerical methods are invariant 
under this transformation. Therefore, theoretical work done for systems of the form 


(1.14) will also be valid for more general problems. 

Systems of Index 3. Problems of the form 

y' = f(y,z) (1.15a) 

z' = k(y,z,u) (1.15b) 

0 — d(y) (l-15c) 

are of differentiation index 3, if 

g y f z k u is invertible (1.16) 

in a neighbourhood of the solution. Differentiating (1.15c) twice gives 

0 = g y f (1.15d) 

0 = 9 yy {f,f) + g y f y f + g y f z k. (1.15e) 
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Equations (1.15a,b), (1.15e) together with Condition (1.16) are of the index 1 form 
(1.13a,b). Consistent inital values must satisfy the three conditions (1.15c,d,e). 

An extensive study of the solution space of general differential-algebraic sys¬ 
tems is done by Griepentrog & Marz (1986), Marz (1989,1990). These authors try 
to avoid assumptions on the smoothness on the problem as far as possible and re¬ 
place the above differentiations by a careful study of suitable projections depending 
only on the first derivatives of F . 

Differential Equations on Manifolds 

In the language of differentiable manifolds, whose use in DAE theory was urged 
by Rheinboldt (1984), a constraint (such as g(y) = 0) represents a manifold, which 
we denote by 

M = {yeR n \g(y) = 0}. (1.17) 

We assume that g :R n ^R m (with m < n) is a sufficiently differentiable function 
whose Jacobian g y (y) has full rank for y G M . For a fixed y G M we denote by 

^ = = (1.18) 

the tangent space of M at y. This is a linear space and has the same dimension 
n — m as the manifold M . 


Fig. 1.2. A manifold with a tangent vector field, a chart, and a solution curve 



A vector field on M is a mapping v : M ->• R n , which satisfies v(y) eT y M 
for all y G M . For such a vector field we call 

y , = v (y)i yeM (1.19) 

a differential equation on the manifold M .. Differentiation on an (n—m) -dimen¬ 
sional manifold is described by so-called charts E i9 where the U i cover 
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the manifold M and the are open subsets of R n ~ m (Fig. 1.2; see also Lang 
(1962), Chap. II and Abraham, Marsden & Ratiu (1983), Chap. III). The local the¬ 
ory of ordinary differential equations can be extended to vector fields on manifolds 
in a straightforward manner: 

Project the vectors v(y) onto E i via a chart p i by multiplying v(y) 
with the Jacobian of at y. Then apply standard results to the pro¬ 
jected vector field in R n ~ m , and pull the solution back to M . 

(see Fig. 1.2). The local existence of solutions of (1.19) can be shown in this way. 
The obtained solution is independent of the chosen chart. Where the solution leaves 
the domain of a chart, the integration must be continued via another one. 

Index 2 Problems. Consider the system (1.14a,b) and suppose that (1.10) is sat¬ 
isfied. This condition implies that g y (y) is of full rank, so that (1.17) is a smooth 
manifold. Moreover, the Implicit Function Theorem implies that the differentiated 
constraint (1.14c) can be solved for z (in a neighbourhood of the solution), i.e., 
there exists a smooth function h(y) such that 

9y(y)f(y,z) =0 <*=*► Z = h(y). (1.20) 

Inserting this relation into (1.10a) yields 

y' = f(y,Hy)), yeM (i.2i) 

which is a differential equation on the manifold (1.17), because f(y, h(y)) G T y M 
by (1.20). The differential equation (1.21) is equivalent to (1.14a,b). 

Example. The manifold M for problem (1.9) is one¬ 
dimensional (circle). In points, where y x ^ ±1, we can 
solve (1.9b) to obtain locally y 2 = ±y/l — y\ . The map 
{Vi ? V 2 ) ^ Vi consitutes a chart tp , which is bijective in a 
neighbourhood of the considered point. Inserting z from 
(1.9c) and the above y 2 into (1.9a), yields an equation 
y[ =G(y 1 ), which is the projected vector field in R 1 . 

Index 3 Problems. For the system (1.15a,b,c) the solutions lie on the manifold 

M = {{y,z)\g(y) = 0, g y {y)f(y,z)= 0}. (1.22) 

The assumption (1.16) implies that g y (y) and g y (y)f z (t/, z) have full rank, so that 
M is a manifold. Its tangent space at (y, z) is 

T (y,z) M = {( v > w )\9y{y)v = Q, g yy {y)(f{y,z),v) 

+ 9y(y)(fy{y,z)v +f z {y,z)w) =o}. 

Solving Eq. (1.15e) for u and inserting the result into (1.15b) yields a differential 
equation on the manifold M . Because of (1.15d,e), the obtained vector field lies 
in the tangent space M for all (y, z) G M . 
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The Perturbation Index 

Now fills thy sleep with perturbations. 

(The Ghost of Anne in Shakespeare’s Richard Ilf act V, sc. Ill) 

A second concept of index, due to HLR89 1 , interprets the index as a measure of 
sensitivity of the solutions with respect to perturbations of the given problem. 

Definition 1.3. Equation (1.1) has perturbation index pi = m along a solution 
u(x) on [0, x\ , if m is the smallest integer such that, for all functions u(x) having 
a defect 

F(u f , u) = S(x), (1.24) 


there exists on [0, x] an estimate 

||u(x) - w(®)|| < C (|[u(0) - u(0)|| + Q max ||£(0II + • • ll^ (m_1) (0ll) 

d-25) 

whenever the expression on the right-hand side is sufficiently small. 

Remark. We deliberately do not write “Let u(x) be the solution of F(ufu) = 
<S(x) ...” in this definition, because the existence of such a solution u(x) for an 
arbitrarily given S(x) is not assured. We start with u and then compute 8 as defect 
of (1.1). 


Systems of Index 1. For the computation of the perturbation index of (1.13a,b) we 
consider the perturbed system 

V = + & i( a ') (1.26a) 

^ = g{y,z) + s 2 (x). (i.26b) 

The essential observation is that the difference z — £ can be estimated with the 
help of the Implicit Function Theorem, without any differentiation of the equation. 
Since g z is invertible by hypothesis, this theorem gives from (1.26b) compared to 
(1.13b) 

||z(a;)-z(x)|| <C 1 (||?/(x)-y(x)|| + ||^ 2 (x)||) (1.27) 


as long as the right-hand side of (1.27) is sufficiently small. We now subtract 
(1.26a) from (1.13a), integrate from 0 to x , use a Lipschitz condition for / and the 
above estimate for z(x) — z{x) . This gives for e(x) = || y(x) — y(x) ||: 


px px px 

(x)<e(0 ) + C 2 e{t)dt + C 3 \\S 2 {t)\\dt+ / S^dt 

Jo Jo Jo 


In this estimate the norm is inside the integral for S 2 , but outside the integral for 8 1 . 
This is due to the fact that perturbations of the algebraic equation (1.13b) are more 


1 The “Lecture Notes” of Hairer, Lubich & Roche (1989) will be cited frequently in the 
subsequent sections. Reference to this publication will henceforth be denoted by HLR89. 
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serious than perturbations of the differential equation (1.13a). We finally apply 
Gronwall’s Lemma (Exercise 1.10.2) to obtain on a bounded interval [0, x\ 

\\y(x)-y(x)\\ <C 4 (\\y{0)-y(0)\\ +\\S 2 (t)\\dt + S^dt ) 

< C 5 (||ftO) — y(0)|| + max^ \\8 2 {t)\\ + max* ^(Oll). 

This inequality, together with (1.27), shows that the perturbation index of the prob¬ 
lem is 1. 

Systems of Index 2. We consider the following perturbation of system (1.14a,b) 

y , = f(y,z) + 5(x) (1.28a) 

0 = fty) + ft*). (1.28b) 

Differentiation of (1.28b) gives 

o = g y (y)f(y, z)+g y (y)S(x) + 0'(x). (1.29) 

Under the assumption (1.10) we can use the estimates of the index 1 case (with 
S 2 (x) replaced by g y (jj(x))5(x) + O'(x)) to obtain 

lift*)—y(*)ii <c-(iiy(o)— y (o)n+ [\mnmmw) 

Jo (1.30) 

\\z{x) - z{x)\\ < c(||ftO) -y(0)|| + o mffi + 0 ™ a ^ ll 6, '(0ll)- 

Since these estimates depend on the first derivative of 6 , the perturbation index of 
this problem is 2. A sharper estimate for the y -component is given in Exercise 6. 

Example. Fig. 1.3 presents an illustration for the index 2 problem (1.9a,b). Small 
perturbations of g(y), once discontinuous in the first derivative (left), the other 
of oscillatory type (right), results in discontinuities or violent oscillations of z, 
respectively. 

The above examples might give the impression that the differentiation index 
and the perturbation index are always equal. The following counter-examples show 
that this is not true. 

Counterexamples. The first counterexample of type M(y)y f = f(y) is given by 
Lubich (1989): 

y[ - y 3 y 2 + vivi = 0 ft - ftft' + ftft' = 0 

y 2 = 0 V 2 — e sincjx (1.31) 

y 3 = 0 y 3 =ecosujx 

with y-(0) = 0 (i = 1,2,3). Inserting y 2 = esincjz and y 3 = ecoscux into the 
first equation gives y{ — e 2 to which makes, for e fixed and to —>• oo, an estimate 
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Fig. 1.3. Perturbations of an index 2 problem 


(1.25) with m = 1 impossible. However, for m = 2 the estimate (1.25) is clearly 
satisfied. This problem, which is obviously of differentiation index 1, is thus of 
perturbation index 2. 

It was believed for some time (see the first edition, p. 479), that the differenti¬ 
ation and perturbation indices can differ at most by 1. The following example, due 
to Campbell & Gear (1995), was therefore a big surprise: 

Vm N y' + J/ = 0. (1-32) 

where iVisamxm upper triangular nilpotent Jordan block. Since the last row of 
N is zero, we have y m = 0, and the differentiation index is 1. On the other hand, 
adding a perturbation makes y m different from zero. This is the reason why the 
perturbation index of (1.32) is m. 

Control Problems 

Many problems of control theory lead to ordinary differential equations of the form 
y' = f{y> u )’ where u represents a set of controls. Similar as in example (1.9) 
above, these controls must be applied so that the solution satisfies some constraints 
0 = g(y;u). For numerical examples of such control problems we refer to Brenan 
(1983) (space shuttle simulation) and Brenan, Campbell & Petzold (1989). 

Optimal Control Problems are differential equations y' = /(y, u) formulated in 
such a way that the control it (a:) has to minimize some cost functional. The Euler- 
Lagrange equation then often becomes a differential-algebraic system (Pontryagin, 
Boltyanskij, Gamkrelidze & Mishchenko 1961, Athans & Falb 1966, Campbell 
1982). We demonstrate this on the problem 

y' = f{y,u), y(0) = y o 


(1.33a) 
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with cost functional 


J(u) 


f 


v{y{x),u(x))dx. 


(1.33b) 


For a given function u(x) the solution y(x) is determined by (1.33a). In order to 
find conditions for u(x) which minimize J(u) of (1.33b), we consider the per¬ 
turbed control u(x) -f- e5u(x) where 5u(x) is an arbitrary function and e a small 
number. To this control there corresponds a solution y(x) + e5y(x) + 0(e 2 ) of 
(1.33a); hence (by comparing powers of e) 

6y'(x) = f y (x)6y(x) +f u (x)6u(x), 5y( 0) = 0, (1.34) 

where, as usual, / ( x) = f y (y(x),u(x)), etc. Linearization of (1.33b) shows that 

J(u + eSu) - J(u) = e j (ip y (x)5y(x) + p u (x)5u{x) s jdx + 0(e 2 ) 

so that 

(p y (x)5y(x) + (p u (x)5u(x)^dx = 0 (1.35) 



is a necessary condition for u(x) to be an optimal solution of our problem. In 
order to express 5y in terms of 5u in (1.35), we introduce the adjoint differential 
equation 


V 1 = -fy(x) T v-<f y(x) T , u(l) = 0 (1.36) 


with inhomogeneity (p y { x ) T • Hence we have (see Exercise 7) 

/ <p y {x)6y(x)dx = / v T (x)f U (x)5u(x)d: 
Jo Jo 

Inserted into (1.35) this gives the necessary condition 

J (v T (x)f u (x) + ip u (x)^5u(x)dx = 0. 


(1.37) 


(1.38) 


Since this relation has to be satisfied for all 5u we obtain the necessary relation 
yT { x )fu ( x ) + Vui 00 ) = 0 by the so-called “fundamental lemma of variational cal¬ 
culus”. 

In summary, we have proved that a solution of the above optimal control prob¬ 
lem has to satisfy the system 


y' = f(y,u), y(0)=y o 

V 1 = -fy{y, ufv- <p y (y, u) T , u(l)=0 (1.39) 

0 = v T / M (y,u) + ^ u (2/,u). 


This is a boundary value differential-algebraic problem. It can also be obtained di¬ 
rectly from the Pontryagin minimum principle (see Pontryagin et al. 1961, Athans 
& Falb 1966). 



VII. 1 The Index and Various Examples 463 


Differentiation of the algebraic relation in (1.39) shows that the system (1.39) 
has index 1 if the matrix 


(L4o) 

i= 1 

is invertible along the solution. A situation where the system (1.39) has index 3 
is presented in Exercise 8. An index 5 problem of this type is given in “Exam¬ 
ple 3.1” of Clark (1988). Other control problems with a large index are discussed 
in Campbell (1995). 


Mechanical Systems 


... berechnen wir T, V, L . Mehr brauchen wir von der Geome¬ 
tric und Mechanik unseres Systems nicht zu wissen. Alles iibrige 
besorgt ohne unser Zutun der Formalismus von LAGRANGE. 

(Sommerfeld 1942, §35) 


An interesting class of differential-algebraic systems appears in mechanical mod¬ 
eling of constrained systems. A choice method for deriving the equations of mo¬ 
tion of mechanical systems is the Lagrange-Hamilton principle, whose long history 
goes back to merely theological ideas of Leibniz and Maupertuis. Let q 1 ,..., q n 
be position coordinates of a system and u i = q- the velocities. Suppose a function 
L(q , q) is given; then the Euler equations of the variational problem 


are given by 


or 



min! 


d_ (dL_ 
dt V dq k 



k = 1,... , n 


' S ^jLqkqAz Lq k 
e=i 1=1 


(1.41) 


(1.42) 


(1.43) 


The great discovery of Lagrange (1788) is that for L — T — U , where T is the 
kinetic energy and U the potential energy , the differential equations (1.43) describe 
the movement of the corresponding “conservative system”. For a proof and various 
generalizations, consult any book on mechanics e.g., Sommerfeld (1942), vol. I, 
§§ 33-37, or Arnol’d (1979), part II. 


Example 1. For the mathematical pendulum of length £ we choose as position 
coordinate the angle 0 = q± such that T = m£ 2 0 2 / 2 and U = — £mg cos 0. Then 
(1.43) becomes £0 — — g sin 0 , the well-known pendulum equation. 
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Movement with Constraints. Suppose now that we have some constraints g^q) = 
0,..., g m (q) — 0 on our movement. Another great idea of Lagrange is to vary the 
“Lagrange function” as follows in this case 


L = T-U-\ 1 g 1 {q)-...-X m g m (q) (1.44) 

where the “Lagrange multipliers” A • are appended to the coordinates. The impor¬ 
tant fact is that, since L is independent of A •, the equation (1.43), for the deriva¬ 
tives with respect to X k , just becomes 0 = g k (q) , the desired side conditions. 

Example 2. We now describe the pendulum in Cartesian coordinates x,y with 
constraint x 2 + y 2 — l 2 = 0. This gives for (1.44) 

L= y(i 2 + y 2 ) -mgy- X(x 2 +y 2 -1 2 ) 

and (1.43) becomes 

mx = —2x\ 

my = —mg — 2yX (1.45) 

0 = x 2 +y 2 -£ 2 . 

In this example the physical meaning of A is the tension in the rod which maintains 
the mass point on the desired orbit. 

The general form of a constrained mechanical system (1.43) is in vector nota¬ 
tion (after replacing dots by primes) 

q' = u (1.46a) 

M(q)u'= f(q,u)-G T (q) X (1.46b) 

0 = g{q) (1.46c) 

where M(q) = — T uu is a positive definite matrix, G(q) = dg/dq and q = 

(<h, ■ ■ ■, <ln ) T . u = {Qi , • • • An ) T . ^ = Various formulations are 

possible for such a problem, each of which leads to a different numerical approach. 

Index 3 Formulation (position level, descriptor form). If we formally multiply 
(1.46b) by M -1 , the system (1.46) becomes of the form (1.15) with (q, u, A) in 
the roles of (y,z,u). The condition (1.16), written out for (1.46), is 

GM~ 1 G t is invertible . (1.47) 

This is satisfied, if the constraints (1.46c) are independent, i.e., if the rows of the 
matrix G are linearly independent. Under this assumption, the system (1.46a,b,c) 
is thus an index 3 problem. 

Index 2 Formulation (velocity level). Differentiation of (1.46c) gives 


0 = G(q)u. 


(1.46d) 
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If we replace (1.46c) by (1.46d) we obtain a system of the form (1.14a,b) with 
(g, u) in the role of y and A that of z. One verifies that Condition (1.10) is 
equivalent to (1.47), so that (1.46a,b,d) represents a problem of index 2. 


Index 1 Formulation (acceleration level). If we differentiate twice the constraint 
(1.46c), the resulting equation together with (1.46b) yield 


(M(q) G T (q) \ (u'\ ( f(q,u) 

\ G(q) 0 7 V A / \- 9 qg (<l)(u,u) 


(1.46e) 


This allows us to express u f and A as functions of g, u, provided that the matrix in 
(1.46e) is invertible. Hence, (1.46a,e) consitute an index 1 problem. The assump¬ 
tion on the matrix in Eq. (1.46e) is weaker than (1.47), because M(q) need not be 
regular. 


All these formulations are mathematically equivalent, if the initial values are 
consistent, i.e., if (1.46c,d,e) are satisfied. However, if for example the index 2 
system (1.46a,b,d) is integrated numerically, the constraints of the original problem 
will no longer be exactly satisfied. For this reason Gear, Gupta & Leimkuhler 
(1985) introduced another index 2 formulation (“... an interesting way of reducing 
the problem to index two and adding variables so that the constraint continues to 
be satisfied”.). 


GGL Formulation. The idea is to add the constraint (1.46d) to the original system 
and to introduce an additional Lagrange multiplier fi in (1.46a). For the sake of 
symmetry we also multiply (1.46a) by M(q ), so that the whole system becomes 


M{q)q' = M{q)u - G T (q)fj, 
M ( q)u’ = f(q, u ) - G T (q)X 
0 = g{q) 

0 = G(q)u. 


(1.48) 


Here the differential variables are (g, u) and the algebraic variables are (/i, A). 
System (1.48) is of the form (1.14a,b) and the index 2 assumption is satisfied if 
(1.47) holds. 


A concrete mechanical system is described in detail, together with numerical 
results for all the above formulations, in Sect. VII.7. 


Exercises 

1. Prove that the initial value problem 

Bv! -f Au — 0, u(0) — u 0 

has a unique solution if and only if the matrix pencil A + A B is regular. 


(1.49) 
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Hint for the “only if” part. If n is the dimension of u , choose arbitrarily n + 1 
distinct A • and vectors ^ 0 satisfying (A + A • B)v • = 0. Then take a linear 
combination, such that a i v i = 0, but a i e XiX v i ^ 0. 

2. (Stewart 1972). Let A + XB be a regular matrix pencil. Show that there exist 
unitary matrices Q and Z such that 

« AZ =Co‘ a")' qbz = ( B » hi) (L50 > 

are both triangular. Further, the diagonal elements of A 22 and B n are all 1, 
those of B 22 are all 0. 

Hint (Compare with the Schur decomposition of Theorem 1.12.1). Let A x be a 
zero of det(A + AT?) and ^ 0 be such that (A + A 1 ^)t; 1 = 0. Verify that 
Bv x ^ 0 and that 

■ 4Zi=Qi (”o‘ !)• s) 

where Q 1 , Z 1 are unitary matrices whose first columns are Bv x and , re¬ 
spectively. The matrix pencil A + A B is again regular and this procedure can 
be continued until det(A + A B) = Const which implies that det B — 0. In 
this case we take a vector v 2 ^ 0 such that Bv 2 — 0 and transform A + A B 
with unitary matrices Q 2 ,Z 2 , whose first columns are Av 2 and v 2 , respec¬ 
tively. For a practical computation of the decomposition (1.50) see Golub & 
Van Loan (1989), Sect. 7.7. 


3. Under the assumptions of Exercise 2 show that there exist matrices S and T 
such that 


I S 
0 I 

I S 
0 I 


All -A 2 

0 A22 

*11 *12 
0 *22 


I T 
0 I 

I T 
0 I 


An 0 \ 

0 A22 / ’ 

*n 0 ^ 

0 * 22 ) 


Hint. These matrices have to satisfy 

A n T + A 12 + SA 22 = 0 (1.51a) 

B 11 T + B 12 + SB 22 = 0 (1.51b) 

and can be computed as follows: the first column of T is obtained from (1.51b) 
because B n is invertible and the first column of SB 22 vanishes; then the first 
column of S is given by (1.51a) because A 22 is invertible; the second column 
of SB 22 is then known and we can compute the second column of T from 
(1.51b), etc. 


4. Prove that the index of nilpotency of a regular matrix pencil A + XB does not 
depend on the choice of P and Q in (1.3). 
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Hint. Consider two different decompositions of the form (1.3) and denote the 
matrices which appear by C 1 ,N 1 and C 2 ,N 2 , respectively. Show the exis¬ 
tence of a regular matrix T such that N 2 = T -1 N X T. 

5. Prove that the system (VI.3.4a,b) has index 2 (it is of the form (1.14a,b) and 
satisfies (1.10)). The full system (VI.3.4) has perturbation index k. 

6. (Arnold 1993). Consider the index 2 problem (1.14) and its perturbation (1.28). 
Prove that the difference A y(x) = y(x) — y(x) satisfies 

l|Aj/(*)ll <c(||Ay(0)||+ max ( jf P(t)5(t)dt 

+\\m\\+(\m)\\+wm 2 )) 

with the projector P(t) =1- (f z (g y f z )~ 1 g y ) (ig(t),z(t)), provided that the 
right hand side is sufficiently small. 

Hint. Linearize Eq. (1.29) around ( y , z) , extract z — z, and insert it into the 
difference of (1.28a) and (1.14a). The term (f z (g y f z )~ 1 )(y{x),z(x))8 , (x) 
can be replaced by ^(f z {g y f z )~ 1 (y(x),z(x))0(x)) +0(||%)||) before in- 
tegration. 

7. For the linear initial value problem 

y' = A(x)y + f(x), 2/(0) =0 

consider the adjoint problem 

v f = —A(x) t v — g(x), u(l) = 0. 

Prove that / g(x) T y(x)dx = / v(x) T f(x)dx. 

Jo Jo 

8. Consider a linear optimal control problem with quadratic cost functional 

y' = Ay + Bu + /(x), y{ 0) = y 0 

J(u)= l - j [y{x) T Cy(x) + u(x) T Du(x)^dx, 

where C and D are assumed to be positive semi-definite. 

a) Prove that J(u) is minimal if and only if 

y = Ay + Bu + /(x), y{ 0) = y 0 
v t = -A T v-Cy , v(l) =0 (1.52) 

0 = B t v + Du. 

b) If D is positive definite, then (1.52) has index 1. 

c) If D = 0 and B T CB is positive definite, then (1.52) has index 3. 
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We have seen in Sect. VI. 1 that the numerical treatment of problems of index 1, 
which are either in the half-explicit form (1.13) or in the form Mu' = ip(u ), is not 
much more difficult than that of ordinary differential equations. For higher index 
problems the situation changes completely. This section is devoted to the study of 
several approaches that are all based on the idea of modifying the problem in such 
a way that the index is reduced. 

Index Reduction by Differentiation 

The most apparent way of reducing the index is to differentiate repeatedly the al¬ 
gebraic constraints (see Definition 1.2). In general, it is recommended to differen¬ 
tiate until having obtained an index 1 problem. For example, the index 2 problem 
(1.14a,b) is replaced by (1.14a,c), or the constrained mechanical system (1.46a,b,c) 
by (1.46a,b,e). The resulting problem is then solved by the methods of Chapter VI. 


We illustrate this approach at the “pendulum example” 

x' — u 1 u' = —xX (2.1a) 

Z/' = t>, u' = -l-yA (2.1b) 

0 = x 2 + y 2 -l. (2.1c) 

In this form it has index 3. Differentiating the algebraic constraint twice yields 

0 = xu + yv, (2.2) 

0 = -X(x 2 +y 2 ) -y + u 2 + v 2 . (2.3) 


Equations (2.1a,b) together with (2.3) represent an index 1 problem. We can extract 
A from (2.3) and insert it into (2.1a,b) to get a differential equation for x, y,u,v, 
which can be solved by standard methods. 

Drift-off Phenomenon. As an example we apply the code DOPRI5 to the index 1 
problem (2.1a,b), (2.3) with initial values x 0 = 1, y 0 = 0, u 0 = 0, v 0 = 0. We are 
interested, how well the constraints (2.1c) and (2.2) are preserved by the numerical 
solution. The result presented in Fig. 2.1 shows that the error in the constraint 
(2.2) grows linearly, that in (2.1c) grows even quadratically. This phenomenon is 
explained as follows: 
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Consider a constrained mechanical system (see (1.46)) 


q ~u 

M(q)u l =f(q,u)-G T (q) A 

0 = g(q). 


Differentiating (2.4c) twice we get 


M(q) G T (q)\ ( u'\ _ 
G(q) 0 / \ A 


f{q,u) 

- 4 ?? ( 9 )( u » u ) 


(2.4a) 

(2.4b) 

(2.4c) 


(2.5) 


which, together with (2.4a), is the corresponding index 1 problem. The important 
observation is now that the index 1 problem possesses a solution for arbitrary initial 
values q 0 and u 0 . Due to the fact that the second derivative of g(q(t)) vanishes 
(this is a consequence of the lower relation of (2.5)), the solution of the index 1 
problem satisfies 

9(q(t)) = g{%) + (* - t 0 )G{q 0 )u 0 , (2.6a) 

G(q(t))u{t) = G(q 0 )u 0 . (2.6b) 


Theorem 2.1. If we apply a pth order numerical method to the index 1 problem 
(2.4a), (2.5) with consistent initial values at t 0 = 0, then the numerical solution 
(q n i u n) at ti me tn satisfies (for t n — t 0 < Const) 

\\g(q n )\\ < h*{At n + Btl), ||G(<z> b || < h p Ct n . 

The value h represents the maximal step size used. 

Proof. Denote by q(t, t 0 ,q 0 ,u Q ) the solution of the index 1 problem with initial 
value (q 0 , u 0 ) at t = t 0 . Since the local error <^ +1 - q(tj +1 , t- is of size 

0(h ^ +1 ) (and similarly for the u -component), it follows from (2.6a) that 

| \g(q{t n , tj+ 1 » 9j+i » U j+ 1 )) - 9(g^n > tj, , uj )) || < h) +1 (A + 2B(t n - t j+1 )). 

Adding up these inequalities from j — 0 to j = n — 1 gives the desired bound for 
g(q n ) , because the initial values are consistent, i.e., g(q(t n , f 0 , q 0 , u 0 )) = 0. The 
second estimate of Theorem 2.1 is proved in the same way. □ 
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Baumgarte Stabilization. The historically first remedy for this drift-off is due to 
Baumgarte (1972). Instead of replacing the constraint (2.4c) by its second time 
derivative, he proposes to replace (2.4c) by the linear combination 

O = g + 2ag + 0 2 g, (2.7) 

where g, g are time derivatives of (2.4c), i.e., 

9 = g(q), 9 ~ G(q)u, g = g gg (q)(u,u) + G(q)(f(q,u) -G T (q)\). 

Eq. (2.7) together with (2.4b) determines u f and A as functions of ( q , u ), and the 
resulting differential equation can be solved numerically. The idea is now to choose 
the free parameters a and f3 in such a way that (2.7) is an asymptotically stable 
differential equation, e.g., (3 — a and a > 0. Consequently, the functions g(q(t)) 
and G(q(t))u(t) are exponentially decreasing, in contrast to (2.6). The difficulty 
of this approach lies in a good choice of a . For small values of a the damping will 
not be sufficiently strong, whereas for large a the resulting differential equation 
becomes stiff and explicit methods are no longer efficient. A careful investigation 
on the choice of a can be found in Ascher, Chin & Reich (1994). 


Stabilization by Projection 

We shall now analyze another possibility for avoiding the instability of the pre¬ 
ceding example, namely the repeated projection of the numerical solution onto the 
solution manifold. 

Index 2 Problems. Consider the system (1.14a,b). Suppose that (y n _ 1 ,z n _ 1 ) is 
an approximation to the solution at time t n _ 1 which satisfies g(y n _i) = 0 and 
g y (y n -i) f (yn-n z n-i) = 0- Applying a numerical one-step method (state space 
form method of Sect. VI. 1) with these values to the index 1 system (1.14a,c) yields 
an approximation y n ,z n that, in general, does not satisfy the constraint (1.14b). 
A natural way of projecting the approximation y n to the solution manifold M. of 
Eq. (1.17) is along the image of f z (see also the projected Runge-Kutta methods of 
Sect. Vn.4). We therefore define y n as the solution of 

y-y n = fzivn^n)^ g(y) = o, ( 2 . 8 ) 

and then we adjust z n by solving the equation g y (y n )f(y n ,z n ) = 0. Applying 
simplified Newton iterations to the nonlinear system (2.8) requires the decomposi¬ 
tion of the matrix 

( I fz(.yn) Z n)\ (O O') 

\9y(y n ) o )■ (2 - y) 

Block elimination shows that the invertibility of (2.9) is a consequence of (1.10), 

and that only the matrix g f z has to be decomposed. Such a decomposition is 

usually already available from the application of the numerical method, so that the 
projection (2.8) is very cheap. 



VII.2 Index Reduction Methods 471 


It is now natural to ask, whether this projection procedure can distroy the con¬ 
vergence properties of the underlying method. For a pth order one-step method 
the local error is of size 0{h pJtl ). Since the solution of (1.14a,c) passing through 
(y n -n z n-i) sa tisfi es d{y{t)) = it holds g(y n ) = 0(h pJrl ). Hence, the solu¬ 
tion of (2.8) satisfies p = 0(h p+1 ), y n — y n = 0{h pJtl ), and z n —z n = 0(h p+1 ). 
By the Implicit Function Theorem this solution depends smoothly on {y n ,z n ), so 
that the mapping (y^j, z n _ 1 ) (y n ,z n ) represents a pth order one-step method 
for (1.14a,c). Convergence of order p thus follows from the standard theory (see 
Sects. VI. 1 and II.3). This proof also applies to multistep methods. 

Constrained Mechanical Systems. For the index 3 system (2.4a,b,c) the situation 
is slightly more complex. We assume consistent values (q n __ x , u n _ 1 , X n _ 1 ) at time 
t n _ x and apply a one-step method to the index 1 system (2.4a), (2.5) to obtain 
(q n ,u n ) . Since the position constraint (2.4c) only depends on q, the projections 
for q and u can be done sequentially. 

Projection on Position Constraint. We define q n as solution of the nonlinear system 
M (§n)(9„-5'„) + G T (5 n )^ = 0 

9(q n ) = 0- 

Projection on Velocity Constraint. With the value q n obtained from the above 
projection we let u n be the solution of 

M {q n ){ u n-u n ) + G T {q n )iJ, = 0 

G{q n )u n =0. 

Lubich (1991) introduced this kind of projection, because “it is invariant under 
affine transformations of coordinates”. We remark that the system (2.11) is linear, 
whereas (2.10) is nonlinear and has to be solved by (simplified) Newton iterations. 
The index 3 assumption that the matrix in Eq. (2.5) is invertible, implies the exis¬ 
tence of the projected values q n and u n (at least for sufficiently small step size). 
It is possible to alter slightly the arguments of M and G T in the upper lines of 
(2.10) and (2.11) or to solve the system (2.11) iteratively, if this is computationally 
advantageous. Convergence of this method is proved in the same way as in the 
index 2 case. 

Velocity Stabilization. It can be seen from (2.6) that errors in the velocity con¬ 
straint G(q)u = 0 are more critical for the numerical solution than errors in the 
position constraint g(q) = 0. It is therefore interesting to study the method, where 
the numerical solution is projected only to the velocity constraint. Alishenas & 
Olafsson (1994) come to the conclusion that “ velocity projection is the most effi¬ 
cient projection with regard to improvement of the numerical integration”. 

We have applied the code DOPRI5 in four different variants to the index 1 for¬ 
mulation of the pendulum equation (2.1): (i) standard application without any pro¬ 
jection, (ii) only projection on the position constraint, (iii) only projection on the 
velocity constraint, (iv) sequential position and velocity projections. The the global 
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Fig. 2.2. Global error of DOPRI5 with various projections (Atol = Rtol = 10~ 6 ) 


error (in position and velocity) during integration is shown in Fig. 2.2. We conclude 
that a projection on the position constraint without projection on the velocity con¬ 
straint does not improve the global error (it makes it even worse in our example). 
On the other hand, velocity stabilization is as efficient as the complete projection 
(position and velocity). Nearly no difference can be observed in Fig. 2.2. 


Differential Equations with Invariants 


Closely related to the above techniques is the numerical treatment of differential 
equations with invariants. Consider the initial value problem 

!/ = /(y), y(0) = y 0 » (2-12) 

and suppose that the solution is known to have the invariant 

<p{y) = 0. (2.13) 

For example, the differential equation (1.46a,e) for (g, u) has the invariants (1.46c) 
and (1.46d). Conservation laws (total energy,...) may also be written in the form 
(2.13). The invariant (2.13) is called & first integral , if y> y {y)f(y) = 0 in a neigh¬ 
bourhood of the solution. 

Linear first integrals of the form tp(y) = c + d T y are preserved exactly by 
most integration methods (e.g., Runge-Kutta and multistep methods). Quadratic 
first integrals are preserved exactly by symplectic Runge-Kutta methods (see The¬ 
orem II. 16.7). More complicated invariants are in general not preserved. 

The above projection techniques can be adapted to the treatment of the problem 
(2.12-13) (see Shampine (1986), Eich (1993), Ascher, Chin & Reich (1994)). We 
apply a numerical method to (2.12) and project (orthogonally or somehow else) the 
numerical solution onto the manifold defined by (2.13). As discussed above, this 
precedure retains the order of convergence of the basic method. 


Hamiltonian Systems. Differential equations of the form 

dH. 


, 9h ( \ 
p, = -grM. 


q'i = 


dp, 






(2.14) 


where H : R 2n —y R is a smooth function, always have H(p, q) = Const as first 
integral. It is tempting to exploit this information and project the numerical solution 
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symplectic Euler, h = 0.01 




Fig. 2.3. Study of the projection onto the manifold H(p, q) = H(po 1 go) 


onto the manifold H (p, q) = H(p 0: q 0 ). Consider for example the perturbed Kepler 
problem with Hamiltonian 


u( N P 1 +P 2 1 _ 0005 

M 2 VU+W 


(2.15) 


and initial values g x (0) = 1 — e , q 2 (0) = 0, (0) = 0, p 2 (0) = yj{\ + e)/(l — e) 

(eccentricity e = 0.6). The upper pictures of Fig. 2.3 show the numerical solution 
obtained by the explicit Euler method with step size h = 0.01; to the left without 
any projection, and to the right with projection onto H — Const. An improvement 
can be observed, but the numerical solution still does not reflect the geometric 
structure of the exact solution (invariant torus). We also have applied the symplec¬ 
tic Euler method (see Eq. (16.54) of Sect. 11.16). Here we see that the numerical 
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solution (without projection) shows the correct qualitative behaviour (this can be 
explained by a backward error analysis, see Sect. 11.16), whereas the projection 
onto H = Const destroys this property. A remedy could be the following: apply a 
symplectic method to the problem, project the numerical solution to H — Const , 
but continue the integration with the unprojected values. 


Methods Based on Local State Space Forms 


This method is also called differential-geometric approach by Potra & Rheinboldt 
(1990). The idea is to regard the differential-algebraic system as a differential 
equation on a manifold (see Sect. VII. 1) and to solve the equation in this manifold 
by introducing suitable local coordinates. 

Let us illustrate this approach at the pendulum example. The equations, for¬ 
mulated in cartesian coordinates, are given in the beginning of this section. The 
solution manifold is (compare with Eq. (1.22)) 

A4 = {(z, y, u, v) | x 2 + y 2 = 1 , xu + yv = 0}. 


This is a 2-dimensional manifold in R 4 and can be parametrized by (<^, 77 ) as 
follows: 


x = costp : u =—r] smy>, 

y = sin<p, v = rjcos(p. 


(2.16) 


A short calculation shows that the system (2.1a,b), (2.3), written in the new coor¬ 
dinates, leads to the well-known equation 


p — 77 , r\ — — cos (p . 


(2.17) 


This differential equation can be solved numerically without any difficulties. The 
numerical approximation in the original coordinates is then obtained via (2.16). 
Obviously, the position and velocity constraints are satisfied exactly. 

Although this example nicely illustrates the main ideas, it may be mislead¬ 
ing. First of all, in typical applications it is not possible to use one and the same 
parametrization throughout the whole integration. Secondly, the choice of coordi¬ 
nates is usually not obvious and the transformed differential equation can be much 
more complicated than the original one (see for example Alishenas (1992)). 


Local State Space Form. Suppose that the differential-algebraic system, which 
we want to solve, can be written as a differential equation 

y f = v(y), yeM (2.18) 

on a smooth d -dimensional manifold M C R n . Consider a coordinate function 
u) : U -> V (sufficiently differentiable, bijective, and 0 /( 77 ) °f f u H rank) between 
the open set U cR d and V C M , and denote the coordinates in U by 77 G R d . 
Under the transformation y = 1 ^( 77 ) the equation (2.18) becomes 

u'ivW = v ( u i'n))- 


(2.19) 
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Since v(y) £ T y M for all y £ M (see Eq. (1.19)), there exists ff such that (2.19) 
holds. Moreover 77 ' is unique, because 0 /( 77 ) * s of full rank. Using the notation 
0 /( 77 )+ = (o;'( 77 ) t o;'( 77 )) cj f (q) T for the pseudo-inverse of 0 /( 77 ) we therefore 
obtain 

V =u'{v) + v{u{r])), (2.20) 

which is an ordinary differential equation in R d and is called local state space form 
of (2.18). Observe that different coordinate functions lead to different state space 
forms. 

The numerical procedure for solving (2.18) is the following: suppose that an 
approximation y k £ M of y(t k ) is given. We then choose a coordinate function 
and apply a standard method (e.g., Runge-Kutta) with initial value r] k = to~ 1 (y k ) 
to the state space form (2.20). This yields an approximation 77 ^ at time . 
Finally, we put y k + x = <^(? 7 fc+i)- By definition of this procedure, the numerical 
approximation 74 +1 again lies in M . 

If one uses one and the same local state space form for the whole integration 
(as it is the case for the pendulum example, Eq. (2.17)), the convergence properties 
for ( 2 . 20 ) carry immediately over to (2.18) via the coordinate function y = 0 ^ 77 ). 
In more complex situations it may be necessary to change the coordinates several 
times, and from a computational point of view it may even be more advantageous 
to change them in every integration step. 

Theorem 2.2. Consider the above procedure for the numerical solution of (2.18), 
and denote by y = 04 ( 77 ) the coordinate transformation of the kth step. If in 
a neighbourhood of ^^(y*.), the matrices 0 ^( 77 ) and 0 ^( 77 )+ are uniformly 
bounded in k, then the convergence properties for standard ordinary differential 
equations carry over to the problem (2.18) on a manifold M. 

Proof. In the case of one-step methods we have 

Vk +1 = ("J 1 (2 ik) + h $k h ))> 

where <£ 4 ( 77 ,/z) is the increment function of the method when applied to ( 2 . 20 ) 
with to replaced by u k . Due to the regularity assumptions on 04 ( 77 ), this formula 
can be written as 

Vk +1 =yk + h ^kiy^ h ) 

and takes the form of a standard one-step method. The assumptions guarantee that 
the functions $ k have a uniform Lipschitz constant with respect to the first argu¬ 
ment. Therefore the convergence proofs of Sect. II.3 apply. For multistep methods 
the situation is analogous. □ 


Choice of Local Coordinates. Let us explain two choices for the constrained 
mechanical system (2.4), whose solution manifold is given by 

M = {(g, u) | g(q) = 0, G(q)u = 0}. 


( 2 . 21 ) 
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Here q,u E R n are generalized coordinates, g(q) E R m and G(q) = g q {q). The 
adaptation to other differential-algebraic systems with known solution manifold is 
more or less straightforward. 

Generalized Coordinate Partitioning (Wehage & Haug 1982). Assuming that the 
Jacobian G(q ) has full row rank, there exists a partitioning q = ( 77 , 77 ) such that 
<7^(77,77) is invertible (77 E R n ~ m , 77 E R m ). By the Implicit Function Theorem 
the constraint g(q) = 0 can be solved for 77 in a neighbourhood of a consistent 
value q 0 = {'Ho^Vo) • Hence, there exists a function 77 = ^(77) (defined for 77 close 
to ?7 0 ) such that <7(77, h ( rj )) = 0 . With a corresponding partitioning u = (1/, P) the 
velocity constraint becomes <7^(77, r})v + 5^(77, 77 )? = 0 and allows us to express v 
in terms of 77, v as v = k ( rpv ). A coordinate function is thus given by 07(77, v) — 
((77, ^(77)), ( 1 /, k(r 7, 1 /))), and the differential equation in these local coordinates is 

77 ' = !/, i/' = 1 /( 07 ( 77 , 1 /)), ( 2 . 22 ) 

where i/'(q,u) collects the ^-components of the solution u'(q,u) of the linear 
system (1.38e). We emphasize that for a numerical implementation the differential 
equation (2.22) need not be known analytically. However, a nonlinear system has 
to be solved each time when the right-hand side of ( 2 . 22 ) has to be evaluated. 

Tangent Space Parametrization (Potra & Rheinboldt 1991, Yen 1993). Instead of 
partitioning the components of q and u we split the vectors q — q 0 and u — u 0 
according to 

q-% = Qo r l + QiV> u-u 0 = Q 0 v + Q 1 v, (2.23) 

where the columns of Q 0 form a basis of the tangent space {u | G(q 0 )v = 0 } to 
the manifold q(q) = 0, which is completed by the columns of Q 1 to a basis of the 
whole space. The condition g(q) = 0 together with the first relation of (2.23) define 
(locally) q and 77 as functions of 77 . Similarly, G(q)u = 0 and the second relation 
of (2.23) define u and v as functions of v and q . Denoting these relationships 
by 77 = ft( 77 ), v — fc(? 7 , v) , we get formally the same coordinate function as in the 
previous example, and the state space form is given by 

*l' = v, v' = Qou'(u(r},v)), (2.24) 

where = (Qq Q 0 )~ 1 Qq is the pseudo-inverse of Q 0 , and u'(q, u) denotes the 
solution of the linear system (2.5). 

The evaluation of ft ( 77 ) requires the solution of a nonlinear system, whose 
Jacobian is 

I -QA 

G{%) 0 ) ■ 

This suggests to take —Q 1 = G T (q 0 ) or better —Q x = M~ 1 (q 0 )G T (q 0 ), so that 
simplified Newton iterations lead to linear systems with a matrix that already ap¬ 
pears in (2.5). The linear system for the computation of k(rpv) has the same 
structure. 
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Due to the fact that the evaluation of the right-hand side of (2.24) requires 
the solution of a nonlinear system, the authors of this approach prefer the use of 
multistep methods which, in general, use less function evaluations than one-step 
methods. In connection with Runge-Kutta methods, Potra (1995) suggests the use 
of certain predicted values instead of the exact solutions of these nonlinear systems, 
and requires that only the approximation at the end of every step lies on the mani¬ 
fold M . The resulting algorithm is then equivalent to solving the index 1 problem 
combined with projections onto M at the end of each step. 


Overdetermined Differential-Algebraic Equations 

In contrast to the approach at the beginning of this section, where the constraint is 
replaced by one of its derivatives, we consider the original system and one or more 
derivatives of the constraints as a unity. For example, the equations of motion of a 


constrained mechanical system become 

q' = u (2.25a) 

M(q)u' = f(q, u) — G T {q)\ (2.25b) 

0 = g{q) (2.25c) 

0 = G{q)u (2.25d) 

0 = 9 qq {q){u ,«) + G{q)M{q)~ x ( f{q , u) - G T (q)X ). (2.25e) 


This system is overdetermined, because we are concerned with more equations than 
unknowns. Nevertheless, it possesses a unique solution, if (1.47) is satisfied and 
consistent initial values are prescribed. 

We illustrate the numerical solution of (2.25) with the BDF method. A formal 


application (see Sect. VI.2) gives 

q k — q — h^u k = 0 (2.26a) 

M(q k )(u k - w) - u k) - G T (q k )\ k ) = o (2.26b) 

g{q k )= 0 (2.26c) 

G(q k K = 0 (2.26d) 


9 qq {q k )( u k’ u k) + G{q k )M{q k ) 1 (f(q k , u k ) - G T (q k )X k ) = 0, (2.26e) 

where 7 = /?*/<**, q= (EiJo a A)/ a k • and « = (Ei=o a i u i)/ a k are known 
quantities. The system (2.26) is overdetermined and does not have a solution, in 
general. A natural idea (Fiihrer 1988) is to search for a least square solution of 
(2.26). There are several ways to do this. One can consider different norms, or one 
can require some of the equations to be exactly satisfied and the remaining ones in a 
least square sense. Fiihrer & Leimkuhler (1991) impose all constraints (2.26c,d,e), 
and treat the remaining equations by the use of a special pseudoinverse. This can 
be achieved by introducing Lagrange multipliers /i fe , q k in the first two equations 
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of (2.26) as follows: 

M{q k ){q k - q- h^u k ) + /ry(G T {q k )n k + (G g (q k )u k ) T q k ) = 0 (2.27a) 

M (lk)( u k “ ~ h l(f(<lk’ u k) ~ GT (<lk) X k) + h l GT (<lk)Vk = 0 - (2.27b) 

For sufficiently small h, the system (2.27a,b), (2.26c,d,e) has a locally unique so¬ 
lution, if (1.47) is satisfied. 

Connection with GGL-Formulation. If we omit the acceleration constraint (2.26e), 
there is no need for two Lagrange multipliers, and we can put 77 *. = 0. The resulting 
system (2.27a,b), (2.26c,d) is then nothing else than the standard BDF discretiza¬ 
tion of the system (1.48). 


Unstructured Higher Index Problems 


We consider a general differential-algebraic system 

F(u',u)= 0. (2.28) 


For its numerical solution we shall construct an ‘underlying ODE’ (see Defini¬ 
tion 1.2) and solve it by any integration method. This approach has been developed 
in several papers by Campbell (1989, 1993). We shall explain the main ideas fol¬ 
lowing the presentation of Campbell & Moore (1995). 

Inspired by the definition of the differentiation index we consider the derivative 
array equations 


F(v!, u) — 0 , 


dF(u f , u) 
dx 


= 0 , 


d m F(u',u) 

dx m 


which we write in compact form as 


G{u',w,u) = 0 , (2.29) 

where w = (u", w'",..., u( m+1 )) collects the higher derivatives of u . In Eq. (2.29) 
we consider w,u, and also u f as independent variables. Besides the usual differ¬ 
entiability assumptions we assume that 

(Al) the matrix (G u ,, G w ) is 1-full with respect to u* ; this means that the relation 
G u ,Au r 4- G w Aw = 0 implies A u r = 0; 

(A 2 ) the matrix (G u ,, G w ) has constant rank; 

(A3) the matrix (Gv,G^,G u ) has full row rank. 


These assumptions are required to hold in a neighbourhood of a particular solution 
of (2.28). The construction of the underlying ODE is based on the following lemma 
and on its proof. 


Lemma 2.3 (Campbell & Moore 1995). Consider a sufficiently smooth problem 
(2.28) and assume that (Al), (A2), and (A3) hold. Then there exist coordinate par¬ 
titions w = (w a ,w b ), u = (u a ,u h ) (and also u r = (u' a , u r b ) with the same partition 
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as for u ), such that the derivative array equations (2.29) are equivalent to 

u 'a = fa( u b )> W a = if 2 {w b ,U b ) 

u 'b = fb( u b)i u a=V>3( u b) 

in a neighbourhood of the consistent initial value (uq, w 0 , u 0 ). 


(2.30) 


Proof We consider the matrix {G u , ,G W ,G U ) evaluated at (uq,u; 0 ,u 0 ) and per¬ 
form a QR factorization, where column permutations are restricted to components 
within the vectors u f ,w, and u . This yields 


'B 1 

Q T (G ul ,G w ,G u )P=\ 0 

0 


C 3 C 4 
0 0 


D x D 2 
D z D 4 


(2.31) 


where P l7 C 3 ,D 5 are nonsingular by Assumption (A3), Q is an orthogonal ma¬ 
trix, and P = diag(P 1 ,P 2 ,P 3 ) with suitable permutation matrices P l5 P 2 ,P 3 . 
Fixing the permutation P, we apply the above factorization also to (G u ,, G w fG u ) 
evaluated at an arbitrary point (u f ,w,u) close to (itg, tu 0 , u 0 ). Because of As¬ 
sumption (A2) this gives (2.31) with smooth matrices Q,B i ,C i , and D { . The 
decomposition (2.31) defines the partitions w = (w a ,w h ) and u = (u a ,u b ). The 
first, second and fourth block-columns in (2.31) form an invertible matrix. The Im¬ 
plicit Function Theorem thus implies that (2.29) can be solved for it', w a ,u a , and 
we obtain the equivalent system 


u> = t Pli w b’ U b), W a =tp 2 (w b , u b ), u a =( fz( w b^ u b)- 


We still have to show that the functions p> 1 and p> 3 are independent of w b . By 
definition of the <p i we have 

G(pi{w b ,u b ), {<p 2 {w b ,u b ),w b ), (<p 3 {w b ,u b ),u b )^ =0. 


Differentiating with respect to w b yields 


G„ 


dw } 


■ + G„. 


dw, Wb ~^ u »' dw h 


= 0 . 


(2.32) 


Multiplying this relation by Q T , we see from Eq. (2.31) that D 5 (dp> 3 /dw b ) = 0. 
Since D 5 is nonsingular, this implies (dp> 3 /dw b ) = 0, so that p> 3 is independent 
of w b . Assumption (Al) now implies from (2.32) that also (<9c p 1 /dw h ) vanishes. 
This completes the proof of the lemma. □ 


Suppose that we know how to compute f a (u b ), f b (u b ) and p> 3 {u b ) for a given 
value u b . From (2.30) we then have an ordinary differential equation for u b , which 
can be solved by any integration method (Runge-Kutta or multistep, explicit or 
implicit, ...), and the remaining components are given by u a = p> 3 (u b ). The 
numerical solution of this method thus preserves all constraints (also the hidden 
ones). 
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Computation of the Values f a (u b ), f b (u b ) and < p 3 (u b ). It follows from Assump¬ 
tion (A3) that (G u /, G w , G U ) T G = 0 is equivalent to G = 0. Thus, for given u h , 
any method of finding the minimum (it 7 , tu, u a ) of the function G T G may be used. 
Campbell & Moore (1995) propose the use of Gauss-Newton iterations. 

Remark . A closely related algorithm has been proposed by Kunkel & Mehrmann 
(1996). Instead of extracting from the derivative array equations an ordinary dif¬ 
ferential equation for all variables, they extract an equivalent index 1 problem and 
solve it by standard integration methods. This modification usually requires one 
differentiation less of the original system (2.28). 


Exercises 

1. Repeat the experiment of Fig. 2.1 with other numerical methods (explicit Euler 
method, multistep methods, constant and variable step sizes, ...). You will 
observe that in some situations the error in g(q n ) grows only linearly, and the 
error in G(q n )u n remains bounded. Try to explain this observation. 

2. a) Prove that the matrix in (2.5) is 1-full with respect to u 7 if and only if the 
restriction of M to the kernel of G is injective (this is exactly the condition 
that is needed in order to be able to apply the methods of this section). 

b) Show by examples that neither M needs to be nonsingular nor G has to be 
of full rank in order that the condition of part (a) is satisfied. 
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BDF is so beautiful that it is hard to imagine something else could 
be better. (L. Petzold 1988 , heard by P. Deuflhard) 


Convergence results of multistep methods for problems of index at least 2 are 
harder to obtain than for semi-explicit index 1 problems (see Section VI.2). A first 
convergence result for BDF schemes, valid for linear constant coefficient DAE’s 
of arbitrary index, was given by Sincovec, Erisman, Yip & Epton (1981). Con¬ 
vergence of BDF for nonlinear DAE systems was then studied by Gear, Gupta & 
Leimkuhler (1985), Lotstedt & Petzold (1986) and Brenan & Engquist (1988). An 
independent convergence analysis was given by Griepentrog & Marz (1986), Marz 
(1990). They considered general linear multistep methods and problems, where the 
differential and algebraic equations (and/or variables) are not explicitly separated. 

There are several implementations of the BDF schemes for differential-alge¬ 
braic systems. The most widely used code is DASSL of Petzold (1982). It is de¬ 
scribed in detail in the book of Brenan, Campbell & Petzold (1989). Further imple¬ 
mentations are LSODI of Hindmarsh (1980) and SPRINT of Berzins & Furzeland 
(1985). 

In this section we consider semi-explicit problems 


y' = f{y,z. 
0 = g{y). 


(3.1) 


We assume that / and g are sufficiently differentiable and that 


9y(y)fz(y> z ) is invertible (3.2) 

in a neighbourhood of the solution, so that the problem has index 2. A linear 
multistep method for (3.1) reads 

k k 

Y a iVn+i = h PifiVn+i, Z n+i ) ( 3 - 3a ) 

2 = 0 2 = 0 

o = g{y n+k ). ( 3 -3b) 

This is not the only meaningful definition of a multistep method for (3.1). One 
could as well replace (3.3b) by 

k 

°= J2My n +i)> 

2 = 0 


(3.4) 
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which is obtained by putting e = 0 in (VI.2.1). The following results can be ex¬ 
tended without any difficulty to the second approach. For BDF schemes (where 
/J o = ...=/J fc _ 1 =0) both definitions are equivalent. 

The convergence results of this section are also valid for index 2 systems of 
the form y f = f(y , z ), 0 = g(y, z ), if they can be transformed to (3.1) without any 
differentiation (see the discussion after Eq. (1.14)). This is because the multistep 
method (3.3) is invariant with respect to these transformations. The same is true 
for problems of the form M(u)u' — <p(u) , if the multistep method is defined by 

k k 

J2 a i u n+i = h Y J ^ v n+i’ M{u n+k )v n+k = <p{u n+k ). (3.5) 

i= 0 i=0 


Existence and Uniqueness of Numerical Solution 


Equations (3.3) constitute a nonlinear system for y n + k , z nJtk . We have the follow¬ 
ing result about the existence of its solution. 

Theorem 3.1. Suppose that for a solution y(x),z(x) of (3.1) the starting values 
satisfy for j = 0,..., k — 1 and x- — x 0 + jh 

Vj ~y(Xj) =0(h), z j -z(x j ) = 0{h), g(y j )=0{h 2 ). (3.6) 

If (3.2) holds in a neighbourhood of this solution and if f3 k 0, then the nonlinear 
system 

k k 

Y J a i y i =h Yl^i^ y ^ Z ^ (3 ' 7a) 

i=0 i=0 

o = 9(y k ) (3.7b) 

has a solution for h < h 0 . This solution is locally unique and satisfies 

yk-y( x k) = °( h ), z k~ z i x k) = °( h )- ( 3 - 8 ) 


Proof. We put 


k 1 


k — 1 




(3.9) 


and define £ close to z(x k ) such that g y (r})f(rj,C) = 0. We further replace 
h(f3 k /a k ) by a new step size which we again denote by h . Then the system (3.7) 
is equivalent to 

Vk = r i + h f(yk> z k) 

o = g{y k ) 

which is simply the implicit Euler method. 


(3.10a) 

(3.10b) 
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We next show that 

r]-y(xk) = °( h )’ (~ z ( x k) = °( h )’ g{v) = 0(h 2 ). (3.11) 

The first relation follows from y- —y(xj) — 0(h) and from Yli= o », = 0; the 
second is a consequence of the definition of £ and of (3.2). The last relation of 
(3.11) can be seen as follows: we replace all f{y i ,z i ) in (3.9) by f(y(x k ),z(x k )), 
introducing an error of size G(h 2 ) in rj . Hence 

i - y( x k) = - r 1 ( y i ~ yl ^ x k))+ h (Yj 

i =0 k ^i=0 

Because of (1.14b,c) this implies 

9{n) = ~Ylz L g y( y ( Xk ^ + 0{h 2 ). (3.12) 

i=0 a k 

The last statement of (3.11) now follows from the fact that g y (y(x k ))(y i —y(x k )) = 
g(y i ) + 0(h 2 ) and from (3.6). 

To show the existence of a locally unique solution of (3.10), it is possible to 
adapt the proof of “Theorem 4.1” of HLR89 to the implicit Euler method. We 
shall, however, reformulate (3.10) in such a way that the implicit function theorem 
is applicable. We write (3.10b) as 

o = g{y k ) = g(y k ) - g(vi h )) + g{vi h )) (3.13) 

= j g y ( v( h ) + T {vk ~ v{ h ))) dT ■ {vk ~ v( h )) +g(vi h )) 

where we have explicitly indicated the dependence of rj on h . Replacing the factor 
yk~ r l(h) by hf(y k ,z k ) from (3.10a) and dividing by h we get the system 

yk- r l( h )- h f(yk, z k) = Q (3.14a) 

j g y {n{ h ) + T {yk -v( h ))) dT -f(yk> z k) + \g{v( h )) =° (3.i4b) 

which is the discrete analogue of system (1.14a,c). For h — 0 the values y k = 
77 ( 0 ) and z k — C(0) satisfy (3.14) because g(rj(h)) = 0(h 2 ) and g y {rj)f(rj, () = 0. 
Further, the derivative of (3.14) with repect to (y k ,z k ) is of the form 


^jf(y( x k)i z ( x k))+°( h2 )- 


(l + 0(h) 0{h ) 

V 0{ 1) (g y f z )( V ,0 + O(h) 


(3.15) 


which has a bounded inverse for ^ — ^0 • Therefore the implicit function theorem 
(Ortega & Rheinboldt 1970, p. 128) yields the existence of a locally unique solution 
of (3.14) and hence also of (3.10) and (3.7). □ 
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Influence of Perturbations 


The influence of perturbations in the multistep formula (3.3) on the numerical so¬ 
lution will be studied in the next theorem. 


Theorem 3.2. Let y k ,z k be given by (3.7) and consider perturbed values y k ,z k 
satisfying 


=h Yl &/(& > %)+ h5 


1=0 


1=0 


(3.16a) 

o =s(Vk) + °- (3.16b) 

In addition to the assumptions of Theorem 3.1 suppose that for j = 0,..., k — 1 
y j -y j = o(h 2 ), z'j — Zj = o(h), s = o(h), e = o(h 2 ). (3.17) 

Then, for h <h 0 we have the estimates 

WVk-VkW < c(\\Y 0 - Y 0 \\ + h\\Z 0 - Z Q \\ + h\\S\\ + ||0|| 


\\z k - z k\\ ^^(Yl\\ 9 y^yk^yj-yj)\\ +h \\^o- Y o\\ 

+ h\\Z 0 -Z 0 \\ + h\\S\\ + \\6 


k -1 


(3.18) 


3=0 


where % - r o = {y k _, - ...,y 0 - y 0 ) T , \\Y 0 - r o || = max ||y--y-||, 

0<j<k—l J J 

and likewise for the z -component. 


Proof. In analogy to the proof of Theorem 3.1 we put 

k — 1 k — 1 


rj: 




. ~ UjL 

and rescale h and 8, so that (3.16) becomes 


— «- 
i=0 k 


Vk =: n + h f(yk, z k) + hS (3.19a) 

0 = g(y k ) + e. (3.19b) 

As in the proof of Theorem 3.1 we conclude from (3.17) that y k —fj= 0(h) and 
z k — C — 0(h) , where ( is such that g y {r})f{rf^ () = 0. Inspired by Eq. (3.14) we 
rewrite (3.19b) as 

0 = j 9 y(v + r (y k -rf))dr- (f(y k ,z k ) + 8) + + \o, (3.20) 

which is now a discrete analogue of Eq. (1.29). Subtracting (3.20) from (3.14b) and 
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exploiting the fact that the matrix g y f z is invertible, we deduce the estimate 

\\z k - z k \\ < c(\\y k -y k \\ + \\r)-ri\\ + Pll + ~ g{v)\\ + ^||0||)- ( 3 - 21 ) 

A Lipschitz condition for / applied to the difference of (3.19a) and (3.14a) yields 

II Vk -Vk\\< II?? - »7ll + hL(\\y k -y k \\ + II z k - 2 J) + ft||<S||. 

Combining the last two estimates we get 

\\yk-y k \\<c{m-v\\ + h\\s\\ + \\e\\) 

\\z k -z k \\ < + + + 1 | 0 ||). 

The conclusion now follows from the definitions of rj and £ and from y k —rj — 

0(h). □ 


Remark 3.3. a) The above proof shows that the constant C in (3.18) depends on 
bounds for certain derivatives of / and g , but not on the constants implied by the 
0(. ..) terms in (3.17) (if h is sufficiently small). This observation will be used in 
the convergence proof below. 

b) For one-step methods (e.g., implicit Euler method, trapezoidal rule) the term 
II Ej=o 9y ( y k ) {Vj - y 3 ) II can be omitted in (3. 18), if we require g(y 0 )=g(y 0 ) = 0 . 
Indeed, it follows from y 1 =% + 0{h) that )(y 0 - y 0 ) = g y {y 0 )(y 0 - y 0 ) + 
O(h\\y 0 — y 0 1 |) . Further we have 

9 y {y 0 ){yo - Vo) = y(y 0 ) - g{y 0 ) + ^(11% - J?oll 2 )> 

so that the term in question is estimated by O(h\\y 0 — y 0 \\) if h is sufficiently 
small. 


The Local Error 

Consider initial values y 3 — y{xj), z 3 = z(x-) (j = 0,..., k — 1) on the exact 
solution of (3.1) and apply the multistep formula (3.7) once. The differences y k — 
y(x k ) and z k — z(x k ) are then called the local errors of the method. 

Lemma 3.4. Suppose that the DAE (3.1) satisfies (3.2) and that the multistep 
method (3.7) has order p (in the sense of Sect. III.2). Then its local error satis¬ 
fies 

y k - y{x k ) - z k - z(x k ) = o(hn. (3.23) 

Proof. We put y • = y(xj), z- — z(x ■) for j = 0,. .., k. These values satisfy (3.16) 
with S = 0(h p ) and 0 = 0 . Since y 3 = y- and 'z- = z- for j < k, the statement 
follows immediately from Theorem 3.2. □ 
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Convergence for BDF 


The study of convergence is simpler for BDF schemes than for general multi- 
step methods, because y n+A . depends only on y n ,..., y n + k _ 1 , but not on z n , ..., 
z n + k _ 1 (due to /3 0 = ... = f3 k _ 1 =0). Therefore the y- and 2 -components can be 
treated separately. The following convergence result was obtained by Gear, Gupta 
& Leimkuhler (1985), Lotstedt & Petzold (1986) and Brenan & Engquist (1988). 

Theorem 3.5. Consider an index 2 problem (3.1) which satisfies (3.2). Then the 
k -step BDF scheme (III. 1.22 ’) is convergent of order p = k, if k <6; i.e., 

y n -y( x n) = °( hP )i z n- z ( x n) = °(. hP ) f or x n = nh<Const, (3.24) 

whenever the initial values satisfy 

y 3 - y(x 3 ) = 0(h p ^) for j = 0,..., fc - 1. (3.25) 


Remark. The assumption (3.25) can be relaxed to y- — y(x-) = G(h p ) for > 3, 
but not for k = 1 (see Exercise 1). 


Proof. We combine the convergence proof for Runge-Kutta methods (HLR89, The¬ 
orem 4.4) with the techniques of Sect. III.4. Inspired by Lady Windermere’s Fan 
(Fig.III.4.1) we first study the propagation of the local errors and their accumu¬ 
lation over the whole interval for the y -component (part a). The 2 -component is 
treated in part (b) and technical details are given in part (c). 

a) In addition to the numerical solution {y n , z n }, which we now also denote 
by {y ®, 2 °}, we consider for i — 1,2,... the multistep solutions {y^ : z with 
starting values y* = y(xj), z e 3 = z(x-) for j = £ — 1,... + k — 2 on the ex¬ 

act solution. Our first aim is to estimate y £ — y n +1 in terms of the local errors 
y\+ k _i — V^Xk-i ( or start i n g errors if t — 0). For simplicity we omit the upper 
index and consider two neighbouring multistep solutions {y n , z n } and {5/ n , z n }. 
In order to be able to apply Theorem 3.2 we fix three sufficiently large constants 
C 0 , C 1? C 2 and suppose that for nh < Const 

\\y n -y( x n)\\< c o h ^ \\y n -yn\\< c i h2 ^ H^-zOOII <c 2 h. (3.26) 

This will be justified in part (c) below. We introduce the notation A y n = y n — 

V„ > A *n = z n- z n and AY n = ( A Vn+k-i > • • ■> A J/J T • Observing that y n+k , 
2 , . do not depend on 2 ,..., 2 , h , for the BDF schemes, it follows from 
Theorem 3.2 with 5 = 0 and 0 = 0 that 


l|Ay B+ *|| < C||Ayj 

IIA^n+fcll < ^ fe \\9y(yn+k) A yn+j\\+ h \\ AY n 


(3.27a) 

(3.27b) 


Here C does not depend on the choice of C 0 , C x , C 2 , if h is sufficiently small (see 
Remark 3.3a). Our assumption (3.26) together with (3.27) implies Ay n+A . = G(h 2 ) 
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and A z n + k = 0(h). We therefore obtain by linearization of the multistep formula 

k 

a > A yn+i = h Pkfz(Vn+k, Z„+k) Az n+k + °( h \\ AY n\\) (3.28a) 

2 = 0 

0 = g y (y n+k )Ay n+k + 0(h\\ AFJ). (3.28b) 

We next use the projections (see also Definition 4.3 below) 

Qn = (fz(9yf z )~ 1 9y)(y„+kiZ„+k)i P n = I -Qn ( 3 -29) 

for which 


Pn=Pn, Ql = Qm P n Qn = Q n P n = 0, Q n+1 = Q n + 0(h). (3.30) 

The last relation of (3.30) follows from (3.26) and the smoothness of the solution 
y(x),z(x). We then multiply (3.28a) by P n + k (which eliminates A z n+k ) and 
(3.28b) by ■ This yields with (3.30) 


Y l a iPn+Ayn + i = 0(h\\AY n \\), Q n+k Ay n+k = 0(h\\AY n \\). (3.31) 

2 = 0 

Introducing the vectors 

U n ~ (Pn+h-l^yn+k-l ? * • • ^n^Vn) ? 

{Qn-\-k — 1 ^yn+k — 1 1 ' ‘ ‘ i i 

we have AF n = U n + V n and the relations (3.31) become 

U n+1 = (A®I)U n + 0(h\\UJ +h\\VJ) (3.32a) 

V*+i = (N® I)V n + 0(h\\UJ + h\\VJ) (3.32b) 

where (with a'- = a-/a k ) 


/-<-! •• 

— a[ 

-«o\ 


(0 . 

. 0 

°\ 

A = 

1 

0 

0 

, N = 

1 

0 

0 



1 

0 ) 


\ 

1 

0 ) 


According to Lemma III.4.4 we now choose a norm ||J7|| such that \\A®I\\ < 1. 
We then choose a (possibly different) norm ||V||, for which ||7V(g)/|( < g < 1. 
Consequently it follows from (3.32) that 


(\\u n+1 \\\ < (i + o(h) 0(h ) \/||[/ n || 
VI|V n+1 |i; - V 0(h) g + 0(h)J V||l/ n || 


(3.34) 


As in the proof of Lemma VI.3.9 we diagonalize the matrix in (3.34) and so obtain 


||AYJ < Const^WUJ + WVJ) 

< Const 2 (\\U Q \\ + ( g n + M||V()||) ^ 
\\V n \\ < Const 3 (h\\U 0 \\ + {g n + h)\\V 0 \\) . 


(3.35a) 

(3.35b) 
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The vectors U 0 and V 0 are composed of local errors (of the y -component) or 
of errors in the starting values, which are of size 0(hP+ l ) by (3.23) and (3.25). 
Hence, it follows from (3.35) that the propagated errors satisfy 


l|Ay n || < C 3 h p+1 , 

\\9y(y n+k )Ay n+J \\ < C 4 (e n + h)h P+1 for j = 0,..., k - 1. 
Summing up we obtain 

n - k -\-1 

hn-y( x n)\\ < Y bn-vi 


v! +1 ll <c 5 h*. 


(3.36) 


(3.37) 


£=0 


the desired estimate for the y -component. 

b) Since z n depends only on y n _ k ,..., y n _i but not on the previous 2 -values, 
we can apply Theorem 3.2 with y\ = y{x i ), z { — z(x i ), S = 0(h p ) and 0 — 0. This 
yields 

\\z„-z(x n )\\ <^'^\\ 9 y(y(x n ))(y n - j -y(x n _ j ))\\ + 0(h p ). (3.38) 

i =i 


Using (3.36) and y l n — y(x n ) + 0(h p ) , which follows as in (3.37), we obtain 


IK(y(0)(y„-j-yK-j 


n - k -\-1 

Y, gyiybn^iyi-j 


1=0 



1 

< E -^>11+o(ft 2 ' >+1 )) = 

£=0 

and hence also 

\\z n -z(x n )\\<C 6 h p . (3.39) 


c) In general, the constants C 3 , C 5 and C 6 will depend on C 0 , C X ,C 2 of our 
assumption (3.26). For p > 2 we can restrict the step size h so that 

C z h p ~ l < C 0 , C^h p ~ l < C 15 C Q h p - 1 < c 2 

and the numerical solutions will never violate the conditions (3.26) on the consid¬ 
ered interval. 

For p= 1 (the implicit Euler method) we know from Remark 3.3b that the 
estimate (3.27b) can be replaced by 

||Az n+fc || < C'||AY n ||. (3.40) 

Instead of (3.28a) we thus immediately get 

Ay n+1 -Ay„ = e>(/i||AyJ) (3.41) 

where the constant implied by the 0 (...) term is independent of C 0 , C l9 C 2 , if 
h is sufficiently small. Standard techniques (without considering the projections 
(3.29)) then yield the convergence result. □ 
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With the ideas of Sect. III.5 the above proof can be extended to cover variable 
step sizes as well. Originally, such a convergence result was given by Gear, Gupta 
& Leimkuhler (1985). 

General Multistep Methods 

For a general multistep method (3.3) with generating polynomials 

k k 

e(o = 5>iC‘. *(o = l>d 

i=0 i=0 

we have the following convergence result. 

Theorem 3.6. Consider an index 2problem (3.1) which satisfies (3.2). Assume that 
the multistep method is stable (Definition III.3.2) and strictly stable at infinity (the 
zeros of a (() lie inside the unit disc |£| < 1). If its order is p> 2, then the global 
error satisfies 

y„-y{ X n) = °( hP )i Z n- Z ( X n) = °( hP ) f 0r X n = nh < Const 
whenever the initial values satisfy (for j = 0,... , k — 1) 

y ] -y(x J )=0{hv +l \ z J -z(x J ) = 0(hr). (3.42) 

Proof. The proof is essentially the same as for the BDF schemes. Due to the de- 
pendence of y n+k , z n+k on y„,... , y n+k _ j and on z n , ..., z n+k _ x the following 
modifications are necessary. 

In addition to (3.26) we assume \\z n -z\\ < C,h. Instead of (3.27) we have 
(from Theorem 3.2) 

\\Ay n+k \\<C(\\AY n \\+h\\AZ n \\) 

II'J < ? (E \\9y(y n+k )^yn +J \\ + h\\AY n \\ + h\\AZ n \\) 

S ‘=0 J 

and (3.28) becomes 

k k 

E a i A y n +r = h E Pifz(y n +k, z n+k) Az n+i + °{ h \\ AY n\\ + ^\\ AZ n\\) 

1=0 1=0 

0 = 9y (y n+k )Ay n+k + 0(h\\AYJ + h'^AZJ). (3.43) 

A recursion for A z n is obtained as follows: we multiply the upper line of (3.43) 
by {(9 y f0~ 1 9y)(yn+k, z n+k) and so get 
k k 

h E Pi Az n+i = E ((9yf z r 1 9y)(y n +k^ Z n+k) A Vn+i 

1=0 i=0 

+ 0(h\\AY n \\ + h 2 \\AZJ). 


(3.44) 
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With the projections P n , Q n of (3.29) and the vectors U n , V n we thus obtain (3.32) 
with an additional 0(h 2 1|AZ n ||) term. From (3.44) we get 


hAZ 


n+1 


where 


(B®I)hAZ n + o(h\\U n \\ + \\V n \\ + h 2 \\AZ n \\), 

/-P'k-i ■■ ~ft ~fto\ 

B = 1 , ? ? 

\ 10/ 


with (3 f - = Pj/fik . For this equation we use a norm for which ||1?(8)J||<«<1. 
This is possible, because the method is strictly stable at infinity. Summarizing, we 
get the inequality 

/ \\u n+ ill \ /I + o(h) 0(h) 0(h) \ / \\UJ \ 

l|v n+1 || < 0 (h) g + 0 (h) 0 (h) ||VJ| (3.45) 

V/»||AZ b+1 ||/ V 0(h) 0(1) k + 0(h) / \h\\AZ n \\/ 

which can be solved as before and yields 

||Af/ n || < C 3 h p+1 , \\Az n \\<C 7 ( e n + K n + h)h^ 

ll^y (y„+)t) A ?/n+j II < c i(s n + « n + h)h p+1 for j = 0,...,fc —1. 

Summing up the propagated errors as in (3.37) we obtain the desired estimates for 
the y- and z-component. □ 


Solution of the Nonlinear System by Simplified Newton 


The nonlinear system (3.3) is usually solved by a simplified Newton iteration and 
it is interesting to study its convergence. As in the proof of Theorem 3.1 we intro¬ 
duce rj by (3.9) and rescale h so that the nonlinear system becomes (omitting the 
indices) 


y — rj — hf(y, z) =0 

s'(y) = o. 


(3.47) 


This is just the implicit Euler method and we can apply the discussion of HLR89, 
Chapter 7. The Jacobian of the nonlinear system (3.47) is 


J = 


I-hL 


-hf z 

0 


(3.48) 


and its inverse has the form 

r-l_( P + 0(h) f z (9yf z )- 1 +0(h)\ 

~\-h- 1 (g y f z )~ 1 g y + 0( 1) h-'{g v f,)-' + 0{l)J 

where P = I — / z (fl , j,/ z ) _1 5 ! , is the projection of (3.29). We now consider the 


(3.49) 
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(3.50) 


simplified Newton method as a fixed point iteration with the function 

The subscript 0 in J 0 indicates that the arguments of the derivatives in (3.48) are 
evaluated at some fixed approximation (f/, () to the solution of (3.47). We shall use 
the notation { f y } 0 for f y (f/, Q , etc. Direct calculation of <3>'(y, z) gives 

{fA9 y f z )- 1 U{9 y }o-9y) + 0(h) h{P} 0 f z + O(hi) \ 

h ~ 1 {{9yf z y 1 }o({9y} 0 -9y) + 0(l) { (.9 y f z )~ 1 9y } 0 ( if z } 0 - / z) + O {ll ) / 

If we assume that ( 77 , () approximates the fixed point of (3.50) with an error of 
0(h), then we have at this fixed point 


With the scaling matrix D = diag (I, hi) (this corresponds to a multiplication of 
the 0 -variables by h) we have ||£>$'( 7 /, z)D~ 1 1| = 0(h ). In the norm \\y\\ + h\\z\\ 
we therefore gain a factor h in each simplified Newton iteration. 


Remark. The above analysis remains valid if f y or parts of it are replaced by zero 
in J 0 . For mechanical problems such an algorithm was proposed by Gear, Gupta 
& Leimkuhler (1985). 


Exercises 


1. Show that the assumption < 7 ( 7 / •) = 0(h 2 ) for j = 0,..., k — 1 cannot be omit¬ 
ted in Theorem 3.1. 

Counterexample. Consider the system x' = 1, y' = k(z), 0 = y — x , where 
k(z) = ( e z ~ l + l)/2. Apply the implicit Euler method with initial values 
= 0 > Vo = h » = 1 • 


2. (Gear, Hsu & Petzold 1981, Gear & Petzold 1984). Consider the problem 


/ 0 0 
y 1 rjx 


y' \ + ( 1 v x \ ( y 


f(x) 

9(x) 


(3.52) 


a) Prove that the system (3.52) has index 2 for all values of 77 . 

b) The z -component of the exact solution is z(x) = g(x) — f' (x). 

c) The implicit Euler method, applied to (3.52) in an obvious manner, yields 
the recursion 


7 


~'n-f 1 


' ^T7 + ‘ 


1 


9^ 


n+1 / 


f{x n+1 ) - f(x n ) 
h 


1 + r, n l + V 

Hence, the method is convergent for 77 > — 1 / 2, but unstable for rj <- 1 / 2 . 
For 77 = — 1 the numerical solution does not exist. 
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RK methods prove popular at IMA conference on numerical ODEs. 

(Byrne & Hindmarsh, SIAM News, March 1990) 


This section is devoted to the convergence of implicit Runge-Kutta methods for 
semi-explicit index 2 systems (3.1) which satisfy (3.2). The e -embedding method 
of Sect. VI. 1 defines the numerical solution by 


Vn+l =y n + h J2 b i k ni, 


°n- 1-1 


= z n+ h Yj b i £ ni ( 4 - la ) 


2=1 


where 


Ki = f(Y ni ,Z ni ), o = g(Y ni ) 
and the internal stages are given by 

3 3 

= + Z ni = z n + h '52< 


tj nj 


3 = 1 


3 = 1 


(4.1b) 


(4.1c) 


(the state space form method (VI. 1.12) does not make sense here, because the al¬ 
gebraic conditions do not depend on z). 

The first convergence results for this situation are due to Petzold (1986). They 
are formulated for general problems F(y',y) = 0 under the assumption of “uniform 
index one”. Since the system (3.1) becomes “uniform index one” if we replace z by 
u f (Gear 1988, see also Exercise 1), the results of Petzold can be applied to (3.1). 
A further study for the semi-explicit system (3.1) is given by Brenan & Petzold 
(1989). Their main result is that for (4.1) the global error of the y-component is 
0(h qJrl ), and that of the z-component is 0(h q ) (where q denotes the stage order 
of the method). This result was improved by HLR89, using a different approach 
(local and global error are studied separately). 


The Nonlinear System 


We first investigate existence, uniqueness and the influence of perturbations to the 
solution of the nonlinear system (4.1). In order to simplify the notation we write 
( 77 , £) for (y n ,z n ), which we assume h -dependent, and we suppress the index n 
in Y ni , etc. The nonlinear system then reads 
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Y l =r } + hY,a l] f{Y^Z ] )\ 

3=1 } i = l,...,3 (4.2) 

0 = 9(Y i ) ) 

Once a solution to (4.2) is known, we can compute i ni from (4.1c) (whenever (a-) 
is an invertible matrix) and then y n+1 , z n+1 from (4.1a). 

Theorem 4.1 (HLR89, p. 31). Suppose that (rj satisfy 

g(il) = 0{h 2 ), g y {ri)f{'q,Q = 0{h) (4.3) 

and that (3.2) holds in a neighbourhood of ( 77 , f). If the Runge-Kutta matrix (a^ ■) 
is invertible, then the nonlinear system (4.2) possesses for h <h 0 a locally unique 
solution which satisfies 

Y t -rj = 0(h), Z i — ( = 0(h). (4.4) 

Remark. Condition (4.3) expresses the fact that ( 77 , () is close to consistent initial 
values. We also see from (4.2) that the solution (Y^, Zf) does not depend on (. 
The value of ( in (4.3) only specifies the solution branch of g y (y)f(y , z) = 0 to 
which the numerical solution is close. 

The proof of Theorem 4.1 for the implicit Euler method was given in Sect. VII.3 
(proof of Theorem 3.1). If we replace (3.14) by 

s 

Y { - V(h) - h %■/(*> ,Z j ) = 0 (4.5a) 

3 = 1 

^(>f - »7(^)))^= 0 ( 4 - 5b ) 

it extends in a straightforward manner to general Runge-Kutta methods. □ 


Influence of Perturbations. Besides (4.2) we also consider the perturbed system 

s 

Yi=rf+ h 'S2 a i3f(Y j ,Z j ) + h5 i 

3 = 1 > i = 1,..., 5 (4.6) 

0 = </(?,) M 


and we investigate the influence of the perturbations 5 i and # ■ on the numerical 
solution. 


Theorem 4.2 (HLR89, p. 33). Let Y-, Z i be a solution of (4.2) and consider per¬ 
turbed values Y-, Z % satisfying (4.6). In addition to the assumptions of Theorem 4.1 
suppose that 

fj—T] = 0(h 2 ), Z i -C = 0{h), 


8 t = 0(h), 0 t = O(h 2 ). (4.7) 
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Then we have for h <h 0 the estimates 

ll^-^HC^-T/ll + h^ll + lieil) (4.8a) 

II Z, -Z i \\<j (\\ 9y (rim-v)\\ + hM-vW + ftlM + \\e\\) (4.8b) 

where ||<5|| =max- ||^-|| and ||0|| =max-||0-||. If the initial values satisfy g(rj) = 0 
and g(rj) = 0, then we have the stronger estimate 

ll^-^ll<|(^-»7ll+WI + ll®ll)- (4-9) 

The constant C in (4.8) and (4.9) depends only on bounds for certain derivatives 
of f and g, but not on the constants implied by the 0(...) terms in (4.3) and (4.7). 

Proof. The estimates (4.8) are obtained by extending the proof of Theorem 3.2. 
When both initial values, rj and rj, lie on the manifold g(y) = 0, we have by 
Taylor expansion 0 = g(fj) — g(rj) = g y (rj)(fj — rj) + 0(\\rj— g\\ 2 ) . In this situation 
the term g y (rj)(rj — rj) in (4.8b) is of size 0(h 2 \\f}— rj\\) and may be neglected. □ 


Estimation of the Local Error 

We begin by defining two projections which will be important for the study of local 
errors for index 2 problems (3.1). 

Definition 4.3. For given y 0 , z 0 for which (g y f z )(y o? z o) invertible we define 

the projections 

Q={fz{9yfzT 1 9y){yo,Zo), P = I -Q. (4.10) 

Geometric interpretation. Let U be the manifold defined by U — {y;g(y) = 0} 
and let T y U = ker (g y {y 0 )) be the tangent space at a point y 0 £U . Further let 
V = {f(y Q ,z) ; 0 arbitrary } and let T fo V = Im {f z (y 0 , 2 0 )) i ts tan g ent space 
at f 0 — f(y 0 ,z 0 ). Here, z Q is the value for which f(y 0 ,z Q ) lies in T y U (i.e., for 
which the condition g y (y 0 )f(yo, z 0 ) = 0 is satisfied (see 1.14c)). By considering 
the arrows f(y 0 ,z) with varying 2 (see Fig.4.1), the space T fo V can be inter¬ 
preted as the directions in which the control variables 0 bring the solution to the 
manifold U . By (3.2) these two spaces are transversal and their direct sum gen¬ 
erates the y -space. It follows from (4.10) that P projects onto T y U parallel to 
T fo V and Q projects onto T fo V parallel to T y U. 

Consider now initial values y 0 = y(x), z 0 = z(x) on the exact solution and 
denote by y x ,z x the numerical solution of the Runge-Kutta method (4.1). The 
local error 


5 Vh( x ) = Vi - v( x + h ), Sz h( x ) = z i~ z ( x + h ) 


(4.11) 
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Fig. 4.1. Projections P and Q 


can be estimated as follows: 

Lemma 4.4 (HLR89, p. 34). Suppose that a Runge-Kutta method with invertible 
coefficient matrix (a-) satisfies the assumptions B(p) and C(q) of Sect. IV. 5 with 
p> q. Then we have 

Sy h (x) = 0(h q+1 ), P(x)8y h (x) = 0 {h^ p+ ^) 

8z h (x) = 0(h q ), 

where P(x) is the projection (4.10) evaluated at (y(x),z(x)). If in addition , the 
Runge-Kutta method is stiffly accurate (i.e., satisfies a si = b i for all i), then 

8y h (x) = 0{h m in(J>+i.9+2)). (4.13) 


Proof. The exact solution values rj = y(x), Y i — y(x + c ffi) , Z- = z(x + cfi) 
satisfy (4.6) with 0 i — 0 and 

uq / 1 _ 3 N 

s . = J - £ «,,<=]) + 0(fc'«). 

The difference to the numerical solution ((4.2) with rj = y(x)) can thus be estimated 
with Theorem 4.2, yielding 

Y i -y(x + c l h) = 0(h‘‘ +1 ), Z i -z{x + c i h)=0{h*). (4.14) 

Since the quadrature formula {6-, c-} is of order p, we have 


y(x + h)-y(x)-hy2 b if(y( x + c i h )> z ( x + c i h )) =0{h p+1 ). 

i— 1 

Subtracting this formula from (4.1a) we get 

s 

y x - y(x + h) = hf z (y(x), z(x)) £ b { {Z { - z(x + c t h)) + 0(h p+1 ) + 0(h q+2 ). 

i= 1 

Because of P(x)f z (y(x), z(x)) = 0, this proves (4.12) for the y-component. The 
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estimate for the z -component follows from (see (1.28)) 

s 

Z 1 — z(x + h) = b i u ij ( Z j - z ( x + c j h )) + 0 {h q+1 ) 

and (4.14). '’ J ~ 1 

Under the assumption a si = 6- (for all i) we have g(y 1 ) — 0 so that by Taylor 
expansion 

0 = g(y 1 )-g(y(x + h)) = g y (y(x))6y k (x) + 0(h\\5y h (x)\\). (4.15) 

This implies that Q(x)Sy h (x) = 0(h\\5y h (x)\\) , and (4.13) is a consequence of 
(4.12) and (4.10). □ 


For some important Runge-Kutta methods (such as Radau IIA and Lobatto 
IIIC) the estimates of Lemma 4.4 are not optimal. Sharp estimates will be given 
in Theorem 4.9 for collocation methods and in Sect. VII.5 for general Runge-Kutta 
methods. 


Convergence for the y -Component 

The numerical solution { y n }, defined by (4.1), does not depend on {z n } . Con¬ 
sequently, the convergence for the y -component can be treated independently of 
estimates for the z -component. 

Theorem 4.5 (HLR89, p. 36). Suppose that (3.2) holds in a neighbourhood of 
the solution (y(x),z(x)) of (3.1) and that the initial values are consistent. Sup¬ 
pose further that the Runge-Kutta matrix (p>ij) is invertible, that |ii(oo)| < 1 (see 
(VI. 1.lie)) and that the local error satisfies 

5 y h ( x ) = °( hr )> p { x )$yh( x ) = o(h r+1 ) (4.16) 

with P(x) as in Lemma 4.4. Then the method (4.1) is convergent of order r, i.e., 

y n -y( x n) =0(h r ) for x n -x 0 =nh< Const. 

If in addition Sy h (x) = 0(/z r+1 ), then g(y n ) = 0(h r+1 ). 

Proof A complete proof of this result is given in (HLR89, pp. 36-39). We re¬ 
strict our presentation to stiffly accurate Runge-Kutta methods (i.e., a si — 6- for all 
i). This considerably simplifies several parts of the proof, and nevertheless cov¬ 
ers many important Runge-Kutta methods (such as Radau IIA, Lobatto IIIC and 
the SDIRK method (IV.6.16)). The assumption a si = b i (for all i) implies that 
g(y n ) = 0 for all n and, as a consequence of (4.15) and (4.16), that 

6y h (x) = 0(h r+1 ). (4.17) 

The following proof is similar to that of Theorem 3.5 and uses, once again, Lady 
Windermere’s Fan of Fig. II.3.2. 
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In addition to the numerical solution { y n , z n }, also denoted by {y°, z® }, we 
consider the Runge-Kutta solutions {y l n , z l n } with initial values y[ — y(x £ ), z[ — 
z{x £ ) on the exact solution. We first estimate y l n — y^f 1 for n > t + 1 in terms of 
the local error Sy h (x £ ) = y^ +1 - y £ +\. In order to simplify the notation we denote 
two neighbouring Runge-Kutta solutions by {y n }, {y n } and their difference by 
A y n — y n — y n . We suppose for the moment that 

\\y n -y( x n)\\ < C o h > l|Aj/J < Cjh 2 (4.18) 

(this will be justified below). Theorem 4.2 with = 0 and 8 { = 0 then yields 

\\Ym-Ym\\<C\\Ay n l \\Z ni -Zm\\<C\\Ay n \\ (4.19) 

where C is some constant independent of C 0 and C 1 . A Lipschitz condition for 
f(y,z) implies that 

3 

W^Vn+l II < l|Ay n || +h Y, ^1(^11?;, -?n t ll +L 2 \\Z ni - z ni \\). 

2—1 

Inserting (4.19) we get ||Ay n+1 || < (1 + hL)\\ Ay n || and hence also 

||Ay n || < C 2 ||Ay 0 || for nh < Const. (4.20) 

For our situation in Lady Windermere’s Fan the use of (4.17) yields 

\\y e n -y e n +1 \\ < cy 5y h ( x e)\\ < c 3 h r+1 forn > l-v 1 and nh < Const. 
Summing up we obtain the desired estimate 

n —1 

I \y„-y( x n)\\ < ll^n-^ +1 H <C 4 h r fox nh< Const. 

£=0 

Since C 3 and C 4 do not depend on C 0 or C 1 (if h is sufficiently small), the 
assumption (4.18) is justified by induction on n provided the constants C 0 , C 1 are 
chosen sufficiently large. □ 


Convergence for the z -Component 

Theorem 4.6 (HLR89, p. 40). Consider the index 2 problem (3.1)-(3.2) with con¬ 
sistent initial values and assume that the Runge-Kutta matrix (a -) is invertible and 
|i?(oo)| < 1. If the global error of the y-component is G(h r ), g(y n ) = (9(/i r+1 ) 
and the local error of the z -component is 0(h r ), then we have for the global error 

z n — z(x n ) = G(h r ) for x n — x 0 = nh < Const. 

Remark. If, in addition to the invertibility of (a - •) and |i2(oo)| < 1, the conditions 
B(q) and C(q) are satisfied then we have z n — z(x n ) = G(h q ) (see Lemma 4.4). 
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Proof. We write the global error as 

Z n+1 ~ Z ( X n+l) = Z n+1 ~ Z n+1 + Sz h( X n) (4-21) 

where (5/ n+1 , ? n+1 ) denotes the numerical solution obtained from the starting val¬ 
ues (y(x n ),z(x n )) and Sz h (x n ) is the local error. From (VI. 1.1 Id) we have 

s 

Z n+1 — Z n + 1 ~ R{oo) {z n ~ z ( x n )) + bjLJij n j ~ ^nj) m (4.22) 

* J = 1 

The assumption g(y n ) = 0(h r+1 ) implies that g y (y n ){y n - y{x n )) = 0(/i r+1 ) 
and, together with y n — y(x n ) = <D(h r ), it follows from Theorem 4.2 that Z n ■ — 

Z n - = 0(h r ). Inserting (4.22) into (4.21) we obtain 

* n+1 - z i x n+ 1 ) = R (°°) ( Z n ~ Z ( X n)) + 0(h r ), 

which proves the statement. □ 


Collocation Methods 

An important subclass of implicit Runge-Kutta methods are the collocation meth¬ 
ods as introduced in Sect. II.7. For the index 2 problem (3.1) they can be defined 
as follows. 


Definition 4.7. Let c x ,..., c s be 5 distinct real numbers and denote by u(x),v(x) 
the polynomials of degree 5 (collocation polynomials ) which satisfy 


II 

o 

o 

S* 

II 

o 

3 


(4.23a) 

u '{ x 0 + c i h ) = f( u ( x o + Cih),v(x 0 + c { h)) | 

0 = g(u{ x o +c i h)) j 

. ,5. 

(4.23b) 

Then, the numerical solution is given by 



Vi =u(x 0 + h), z 1 =v(x 0 +h). 


(4.23c) 


A straightforward extension of Theorems II.7.7 and II.7.8 to index 2 problems 
shows that (4.23) is equivalent to the 5-stage Runge-Kutta method (4.1) whose co¬ 
efficients are defined by B(s) and C(s) (see Sect. IV.5 for their definition). This 
equivalence allows us to deduce from Theorem 4.1 the existence and local unique¬ 
ness of the collocation polynomials provided that the corresponding Runge-Kutta 
matrix is invertible. Hence we assume in the sequel that 0 for all i . The case 
of a singular Runge-Kutta matrix is considered in Exercises 2 and 3. 

The quality of u(x), v(x) as approximations to y(x), z(x) is described by the 
next theorem, which extends Theorem II.7.10. 
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Theorem 4.8. Consider a collocation method (4.23) with all c- ^ 0. Then we have 
for k = 0,1,..., 5 and x 6 [x 0 , x 0 + h} 

||u (fc) (*)- 2 / (fc) (*)|| <Ch s+1 ~ k , 
\\v W (x)-z W (x)\\<Ch s ~ k . 


Proof We exploit the fact that u(x 0 + c -/z) = Y i , v(x 0 + c { h) = Z { are the internal 
stages of the Runge-Kutta method (4.1). Consequently the collocation polynomials 
can be written as 


u ( x 0 + th ) = Vo 4 W +J2 Y i 
2=1 
S 

v(x 0 + th) = z 0 e 0 (t) + Y z i 
2 = 1 

where the i^t) are the Lagrange polynomials defined by 




3^1 


(4.24a) 

(4.24b) 


Familiar estimates of the interpolation error imply that the exact solution y(x) 
satisfies 


y(x 0 + th) = y 0 £ 0 (t) + y( x o + W + 0(h s+1 ). (4.25) 

2=1 

The factor h s+1 in the interpolation error comes from the (5 + 1) -th derivative of 
y(x 0 4-th) with respect to t. Obviously, the interpolation error is differentiable 
as often as the function y(x). If we differentiate (4.25) k times, then by Rolle’s 
theorem, the difference 

s 

h k y (k \x 0 +th)- {y 0 £ i 0 k) {t) + Y j y( x o + c i h )4 k) ( t )) (4.25’) 

2=1 

vanishes at least at 5 + 1 — k points. Hence, the polynomial enclosed in brackets 
in (4.25’) can be interpreted as an interpolation polynomial of degree s — k for the 
function h k y^(x 0 4-th). Its error is thus again of size 0(h s+1 ). Subtracting 
(4.25) from (4.24a) and differentiating k times thus yields 

s 

h k (u {k) (x 0 + th) - y (k \x 0 + th)) = “ v( x o + c i h )) + 0(h s+l ) 

2=1 


and a similar formula for the z-component. The conclusion now follows from 
(4.14) with q — s. □ 
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Superconvergence of Collocation Methods 

It is now natural to ask whether superconvergence takes place at x 0 + h (as for 
ordinary differential equations; see Theorem II.7.9). The answer is affirmative, if 
the method is stiffly accurate, i.e., if c s = 1 . 

Theorem 4.9. If c • 7 ^ 0 for all i and c s = 1 , then the y -component of the local 
error of the collocation method (4.23) satisfies 

Vi -y( x o + h ) = o(h p+1 ), 

where p is the order of the underlying quadrature formula. 

Proof. We insert the collocation polynomials into the differential-algebraic problem 


and define the defect by 

u'(x) = f(u(x),v(x))+5(x) (4.26a) 

0 = g(u(x)) + 9(x). (4.26b) 

By Definition 4.7 we have 

S(x Q +c i h) = 0 , 0(z o )=O, 0(x Q +c i h) = 0 . (4.27) 

We next differentiate (4.26b) with respect to x and use (4.26a): 

0 = g y {u{x))(f(u(x),v(x))+8(x)) +8’(x). (4.28) 

This motivates the use of the equation 

o = g y (u)(f{u,v) + 6(x)) +e'(x) (4.29) 

for arbitrary (it, u) in a neighbourhood of the solution of (3.1). Because of (3.2) 
we can extract v from (4.29) so that (4.29) can be written as 

v = G(u, S(x), @'(x)) . (4.30) 

Inserting into (4.26a) and into (3.1) this yields 

u\x) = f^u(x), G(u(x), &{x), 0 1 {x)^j + £(z) (4.31a) 

y'( x ) = f(y{ x ),G{y( x ), o,o)). (4.3 ib) 


In order to compute u(x) — y(x) we now apply the nonlinear variation-of-constants 
formula (Theorem 1.14.5). This requires the computation of the defect of u(x) 
inserted into (4.31b) 

u'{x) - f(u(x),G(u(x),0,0)'j 

= f (u(x),G(u(x),6(x),6' (x))^J +S(x)-f(u{x),G(u(x),0,0)^j 
= $(x, 1) — $(x,0) + ^(s) 


(4.32) 
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where 


t) = f(u(x), G(u(x),t ■ S(x),T ■ . 


Then the formula <$(x, 1) — 3>(x, 0) = dQ/dr (x,r) dr shows that the defect 
(4.32) can be written as 


Q 1 (x)S(x) + Q 2 (x)0 f (x). (4.32’) 

We now insert this into Eq. (1.14.18) and obtain 

u(x) — y(x) = / resolvent(x, t) • defect(t) dt 
Jx 0 

— ( (s i (x,t)8(t) + S 2 (x,t)6 f (t)\dt. 

dxo^ ' 

Integrating the second term by parts we get (since 0(x Q ) = 0) 

fXo + h Qg 

Vi - y( x o + h ) = J [Si(x 0 + h,t)5(t)- -gl(x 0 + h,t)6(t)) dt ^ ^ 

+ ^(xq + h, x 0 + h) 0(x o + h). 

The assumption c s = 1 implies that #(x 0 +/i)=Oso that the last expression in 
(4.33) vanishes. The main idea is now to integrate the expression in (4.33) with 
the quadrature formula {6-, c-} (see also the proof of Theorem II.7.9). With the 
abbreviation 


cr(t) = S 1 (x 0 + h,t)5(t)~ -Q^-(x 0 + h,t) 9{t) 


this gives 


rXo-\-h 3 ^ 

y 1 -y{x 0 + h)= / a(t)dt = h'^2b i or(x 0 +c i h) + err(or). (4.35) 

i =1 

Because of (4.27) we have cr(x Q + c-/i) = 0 for all i and the quadrature error is 
estimated by 

||err(cr)|| < Ch p+1 max ,lk W WI|. (4-36) 

te[x 0 ,x 0 -\-h] 


The p-th derivative of a(t) contains derivatives of f,g and of £(x), 0{x). By 
Theorem 4.8 they are uniformly bounded for h < h Q . Hence y 1 — y(x 0 + h) = 
err(cr) = 0(h v+1 ), proving the theorem. □ 
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Projected Runge-Kutta Methods 


For collocation methods which are not stiffly accurate it is possible to prove super¬ 
convergence (as in Theorem 4.9) if the method is combined with a certain projec¬ 
tion. We start with a more careful study of the local error of the y -component in 
(4.33). 

Lemma 4.10. If c- ^ 0 for all i, then the y -component of the local error of the 
collocation method (4.23) satisfies 

Vi - y{ x o + h) = - (/z^/z) -1 ) (y( x o + h ), z ( x o + h ))0( x o +h) + 0(h p+1 ) 

(4.37) 

where 6 is the defect given by (4.26b) and p is the order of the underlying quadra¬ 
ture formula. 

Proof. The above proof of Theorem 4.9 (see Eq. (4.33)) shows that the local error 
satisfies 


y 1 - y(x 0 +h) = S 2 (x 0 + h,x 0 + h) 0{x o + h) + 0(h p+1 ). 

Hence, we only have to compute S 2 (x, x) . Since any resolvent equals the identity 
matrix if both of its arguments are equal, it follows from the definition of 5 2 (x, t) 
and from (4.32’) that 

S 2 (x, x) = j f z (u(x), G(u(x), tS(x), T0'(x))"j^ (m(:e), tS(x), t0'(x )) dr. 
Differentiating (4.29) with respect to 0 f gives 

89' 89' 

Furthermore, it follows from (4.27) that <5(x) = 0(h 3 ) and 0 f (x) = 0(h 3 ) for 
x = x 0 + h . Using u(x) — y(x) = 0(h 3+1 ) (from Theorem 4.8) we thus obtain for 
x = x 0 + h 

S 2 {x,x) = (f z (g y f z )~ 1 )(y{x),z(x)) + 0(h s ). 

The statement now follows from p<2s and from Q(x 0 + h) — 0(h 3+1 ). □ 


The geometric interpretation of Lemma 4.10 is as follows: if we split the 
local error Sy h (x 0 ) according to the projections of Fig.4.1 then the component 
Q(x 0 + h)8y h (x 0 ) is of size 0(/z s+1 ), whereas the component P(x 0 + h)8y h (x 0 ) 
is 0(h pJtl ) . This suggests to project after every step the numerical solution of a 
Runge-Kutta method onto the manifold g(y) =0 with the help of the projection 
operator P(x 0 + h) as follows: 



VII.4 Runge-Kutta Methods for Index 2 DAE 503 


Definition 4.11 (Ascher & Petzold 1991). Let y 1 , z x be the numerical solution of 
an implicit Runge-Kutta method (4.1) and define y x , A as the solution of the system 

yi = yi+ /*(&,*i)A 
0 = 5(Vi). 

If the value y x (and z x ) is used for the step by step integration of (3.1), then we 
call this procedure projected Runge-Kutta method. 

Remarks. 1) If g(y 1 ) is sufficiently small, then the nonlinear system (4.38) pos¬ 
sesses a locally unique solution. A Newton-type iteration with starting values 
y[ 0 ^ = y 1 , A(°) = 0 will converge to this solution. This follows at once from the 
theorem of Newton-Kantorovich (Ortega & Rheinboldt 1970) because the Jacobian 
of (4.38) evaluated at the starting values 

( i -/z(j/i,*iA 

\9y(Vl) 0 J 

has a bounded inverse by (3.2). 

2) For stiffly accurate Runge-Kutta methods (i.e., if a si = b i for all i) the 
projected and unprojected Runge-Kutta methods coincide. 

3) The proof of the next theorem shows that the argument in f z (y x , zf) may 
be replaced by some other approximation to y(x 0 + h) , z(x 0 + h) whose error is 
at most G(h s ). 

The following theorem proves superconvergence for projected collocation me¬ 
thods (also if the corresponding Runge-Kutta method is not stiffly accurate). Su¬ 
perconvergence results for general Runge-Kutta methods are given in Sect. VI.8. 

Theorem 4.12 (Ascher & Petzold 1991). If c • 0 for all i, then the y -component 

of the local error of the projected collocation method (4.23), (4.38) satisfies 

Vi -y(x 0 + h) =0(h p+1 ) 

where p is the order of the underlying quadrature formula. 

Proof. We write e 1 =y 1 — y(x 0 + h) , e 1 = y 1 — y(x 0 + h) for the local errors and 
denote the projections of Definition 4.3 by 

Q = (f,(9yf,)- 1 9 y )(yi,*i), P = i-Q. 

The idea is to split e x according to 

e x — Pe x + Qe x (4.39) 

and to estimate both components separately. The first formula of (4.38) together 
with (4.37) and 0(x o + h) = 0(h s+1 ) imply that 

Pe 1 = Pe x = 0(^ +1 ) + 0(/i s+1 ||ei ||). 


(4.40) 
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Further we have 0 = g^) - g{y{x 0 +h)) = g y (y 1 )e 1 +0^ || 2 ), implying 

Qe^OiW'e.f). (4.41) 

Formulas (4.40) and (4.41) inserted into (4.39) give 

e 1 = O(^+ 1 ) + O(^+ 1 ||e 1 ||) + 0(||e 1 || 2 ) 
and the statement of the theorem is an immediate consequence. □ 


Global convergence of order 0(h p ) of the projected collocation methods is 
obtained exactly as in the proof of Theorem 4.5. We observe that the numerical 
solution always remains on the manifold g(y) = 0 so that the estimate (4.9) applies. 

Summary of Convergence Results 

Table 4.1 collects the optimal error estimates for some important Runge-Kutta 
methods when applied to the index 2 problem (3.1)—(3.2). The local error esti¬ 
mates can be verified as follows: Gauss, Radau IA and SDIRK by Lemma 4.4, 
Radau IIA by Theorem 4.9, Lobatto IIIC by Theorem 5.10 below and Lobatto IIIA 
with the help of Exercise 4. For the projected methods the estimates follow from 
Theorem 4.12 and the considerations of Sect. VII.5. Because there are several ways 
of defining the z -component of the numerical solution, we do not present their con¬ 
vergence behaviour. The global convergence result follows from Theorems 4.5 and 
4.6 for the Radau IA, Radau IIA, Lobatto IIIC and SDIRK methods. The remaining 
methods (Gauss and Lobatto IIIA) require some more effort because their stability 
function only satisfies |iJ(oo)| = 1. For a detailed discussion of these methods we 
refer to HLR89 and Jay (1993). 


Table 4.1. Error estimates for the index 2 problem (3.1)-(3.2) 


Method 

stages 

local error 

global 

error 



y 

z 

y 

z 

Gauss 

f 5 odd 
\ 5 even 

h s+i 

h 3 

fh° +i 

\h s 

/ft- 1 

U- 2 

projected Gauss 

5 

h 2s+i 


h 2s 


Radau IA 

5 

hs 

ft- 1 

h 3 

ft— 1 

projected Radau IA 

5 

h 2s ~ i 


h 2 *- 2 


Radau IIA 

5 

h 2 ° 

h 3 

h 2 *- 1 

h 3 

Lobatto IIIA 

f 5 odd 
\ 5 even 

h 2 *- 1 

h 3 

h 2s-2 

fh 3 - 1 

\h s 

Lobatto IIIC 

5 

h 2 *- 1 

h 3 - 1 

h 2s ~ 2 

h 3 - 1 

SDIRK (IV. 6.16) 

5 

h 3 

h 1 

h 2 

h 1 

SDIRK (IV.6.18) 

3 

h 2 

h 1 

h 2 

h 1 
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Exercises 


1. Consider the index 2 problem y f = f(y , z ), 0 = g(y). Put z — u f , v — (y, u) T 
so that the problem becomes 


F(v f ,v) 


y r -f(y^) \ _ 

g(y) )~° 


Prove that the matrix pencil F V + \F V , isofindex 1 whenever (^/ z ) -1 exists. 
Hint. Consider the transformation 



(4.42) 


where a = f y f z ( g y f z ) ~ 1 and b = f z are chosen such that the upper right block 
in (4.42) vanishes. 


2. Consider Runge-Kutta methods whose coefficients satisfy: 

a u = 0 for all i and (a^)^> 2 is invertible. 

(Examples are collocation methods with c 1 — 0, such as Lobatto III A). 

If g{j]) — 0 then the nonlinear system (4.2) has a locally unique solution which 
satisfies Y x = g , Z x = £. 

3. Let c x — 0, c 2 ,..., c s be 5 distinct real numbers. Show that there exist unique 
polynomials u(x) and u(x) (degu = 5, degv = 5 — 1) such that (4.23a,b) 
holds. 

Hint. Apply the ideas of the proof of Theorem II.7.7 and Exercise 2. 

4. Investigate the validity of the conclusions of Theorems 4.8 and 4.9 for the 
situation where c x — 0. 

5. (Computation of the algebraic variable z by piecewise discontinuous interpo¬ 
lation , see Ascher (1989)). Modify the definition of z n+1 in the Runge-Kutta 
method (4.1) as follows: let u(x) be the polynomial of degree 5 — 1 satisfy¬ 
ing v(x n + c-fo) = Z ni for all i, then define z n+1 = v(x n + h). In the case 
of collocation methods (4.23) this definition removes the condition u(x 0 ) = z 0 
while lowering the degree of u(x) by 1. 

a) Verify: z n+1 does not depend on z n , also if the stability function of the 
method does not vanish at infinity. 

b) Prove that for projected collocation methods with c- ^ 0 for all i we have 
z n -z{x n ) = 0{h‘). 

c) For the projected Gauss methods compare this result with that of the standard 
approach. 

6. The statement of Theorem 4.8 still holds, if one omits the condition v(x 0 ) = z 0 
in Definition 4.7 and if one lets u(x) be a polynomial of degree 5 — 1. 
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For an application of the convergence result of the preceding section (Theorem 4.5) 
it is desirable to know the optimal values of r in (4.16). Comparing the Taylor 
expansions of the exact and numerical solutions we derive conditions for c-, a i j,bj 
which are equivalent to (4.16). For collocation methods we recover the result of 
Theorem 4.9. For other methods (such as Lobatto IIIC) the estimates of Lemma 4.4 
are substantially improved. 

The theory of this section is given in HLR89 (Sect. 5). Our presentation is 
slightly different and is in complete analogy to the derivation of the index 1 order 
conditions of Sect. VI.4. The results of this section are here applied to Runge- 
Kutta methods only; analogous formulas for Rosenbrock methods can be found in 
Roche (1988). An independent investigation, conducted for the index 2 problem 
/(y, z f ) = 0, 2 = g(y) by A. Kvaernp (1990), leads to the same order conditions 
for Runge-Kutta methods. 

Derivatives of the Exact Solution 

We consider the index 2 problem 

y' = f{y,z) (5.ia) 

0 = g(y) (5.1b) 

and assume consistent initial values y 0 ,z 0 . The first derivative of the solution y(x) 
is given by (5.1a). Differentiating this equation we get 

y" = f y (.y,z)y' + f z (yi z ) z ' ■ (5-2) 

In order to compute z f we differentiate (5.1b) twice 

o = 9y(y)y' (5.3a) 

^ = 9yy{y)(y',y') + 9 y (y)y" (5.3b) 

and insert (5.2) and (5.1a). This yields (omitting the obvious function arguments) 
Q=9yy{f,f)+9yfyf + 9yfz Z ' (5-4) 

or equivalently 


Z '=(-9yf z ) 1 9yy{f,f) + (-9yfz ) ^’dyfyf- 


(5.5) 
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Here we have used the index 2 assumption (3.2), that g y f z is invertible in a neigh¬ 
bourhood of the solution. We now differentiate (5.1a) and (5.5) with respect to x , 
and replace the appearing y f and z' by (5.1a) and (5.5). We use (for a constant 
vector u) 

= (-9yfz)~ 1 {9yy(fz(-9yfz)~ 1 u,f) + g y f zy (( -g y f z )~ 1 u , /) (5.6) 

+ 9yf ZZ ( 9 yf 2 ) ^ 9yy(f 1 f) + (“ 9 y f z)^ 9 y fy /) ) 

(cf. Formula (VI.4.7)) and thus obtain 

y” = fyf + fA-gyfz^gyyif, f) + fz( 9 yf 2 ) ^ 9 y fy f (5-7) 

z " = (-5 y / 2 ) _1 5 y ^(/,/,/) + 3(-g y / 2 ) _1 5^(/,/ y /) (5.8) 

+ 3(-5 y /z) _1 5 yy (/, fzi-gyfz^gyyif, /)) 

+ 3(-gyfz)~ 1 9yy(fJz(-gyfz)~ 1 g y fyf)+(-gyf z )~ 1 gyfyy{fJ) 

+ 2 (-gyfz^gyfyz (/> (-gyfzY^yyif, /)) 

+ 2 {-gyfzY 1 g y fyz (/, (- gyfzT'gyfyf) + ( 9yfz)~^ 9yfyf yf 

+ (-g y fz)~ 1 g y f y fz{-9yfz)~ 1 9yy(fJ) 

+ {-9yf z y 1 gyfyfz(-9yfzT 1 gyfyf 

+ (- g y fzY X g y fzz ({-gyfzYYyyif, /), (-5 y fzYYyyifi /)) 

+ 2 (-fi f y /z) _1 5 y / 2Z ((-5 y / 2 ) _1 fi' yy (/,/),(-5 y /z) _1 5 y / y /) 

+ (-fi , y /z)“ 1 5 y / 22 ((-5 y / 2 ) -1 5 y / y /,(-5 y / 2 ) _1 ^ y / y /) • 

Obviously, a graphical representation of these expressions will be of great help. 


Trees and Elementary Differentials 

As in Sect. VI.4 we identify each occuring / with a meagre vertex, each of its 
derivatives with an upwards leaving branch, the expression (— gyf^^g with a fat 
vertex and the derivatives of g therein again with upwards leaving branches. The 
corresponding graphs for y', z', y", z" (see Formulas (5.1a), (5.5), (5.7), (5.8)) are 
given in Fig. 5.1. 

The derivatives of y are characterized by trees with a meagre root (the lowest 
vertex). These trees will be denoted by t or t •, the tree consisting of the root only 
(for y') being r . Derivatives of z have trees with a fat root. They will be denoted 
by u or u •. 


Definition 5.1. Let DAT 2 = DAT2 y UDAT2 Z denote the set of (differential alge¬ 
braic index 2) trees defined recursively by 
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- v 


Fig. 5.1. Graphical representation of the first derivatives 


a) r G DAT2 y , 

b) [^i i • • • 5 u i > • • • U ri\y ^ DAT2 y 

if t x ,..., t m G DAT2 y and u 1 ,... u n G DAT2 Z ; 

c) [fi,..., t m \ z G DAT2 Z if 1 1 ,..., t m G DAT2 y and either m > 1 or 

m = 1 and f x ^ [tt]^ with u G DAT2 Z . 

Definition 5.2. The order of a tree t G DAT2 y or u G DAT2 Z , denoted by £>(£) or 
g(tt), is the number of meagre vertices minus the number of fat vertices. 

Definition 5.3. The elementary differentials F(t) (or F(u )) corresponding to trees 
in DAT2 are defined as follows: 

a) F(r) = f, 

b) F{t) = W^ 

if t = [t 1 ,...,t m , u 1 ,...,u n \ y eDAT2 y , 

c) F {u) = (-g y f z )-i^ 

if « = J z eDAT2 z . 


Taylor Expansion of the Exact Solution 

In order to continue the process which led to (5.7) and (5.8) we need the differentia¬ 
tion of elementary differentials F(t) and F(u) . This is described by the following 
rules: 

i) attach to each vertex a branch with r (derivative of / or g with respect to y 
and addition of the factor y f = /); 
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ii) attach to each meagre vertex a branch with [r , r] z ; attach to each meagre vertex 
a branch with [ [r] ] z (this yields two trees and corresponds to the derivative 
of / with respect to 0 and to the addition of the factors (— g y f z )- 1 g yy {f, f ) 
and {—g y f z )~ 1 9yfyf of (5.5)); 

iii) split each fat vertex into two new fat vertices (one above the other) and link 
them via a new meagre vertex. Then four new trees are obtained as follows: 
attach a branch with r to the lower of these fat vertices; attach a branch with 
t, \ T ’> T \ z or [ [ T ] y ] z t0 new meagre vertex (this corresponds to the derivation 
of (— dyfz) -1 an d f°ll° ws at once from Eq. (5.6)). 

Some of the elementary differentials in (5.8) appear more than once. In order to 
understand how often such an expression (or the corresponding tree) appears in the 
derivatives of y or z , we indicate the order of generation of the vertices as follows 
(see Fig. 5.2): for the trees of order 1, namely t, [t, t] z and [ [r] ] z , we add the 
label 1 to a meagre vertex such that 

each fat vertex is followed by at least one unlabelled meagre vertex. (5.9) 

Each time a tree is “differentiated” according to the above rules we provide the 
newly attached tree (of order 1) with a new label such that (5.9) still holds. The 
labelling so obtained is obviously increasing along each branch. 



Fig. 5.2. Examples of monotonically labelled trees 


Definition 5.4. A tree t e DAT2 y (or u G DAT2 Z ), together with a monotonic 
labelling of g(t) (or g(u)) among its meagre vertices such that (5.9) holds, is 
called a monotonically labelled tree. The sets of such monotonically labelled trees 
are denoted by LDAT2 y , LDAT2 Z , and LDAT2. 

Since the differentiation process of trees described above generates all elements 
of LDAT2 , and each of them exactly once, and since each differentiation increases 
the order of the trees by one, we have the following result. 

Theorem 5.5 (HLR89, p. 58). For the exact solution of (5.1) we have: 

y (q \x 0 )= Y F (t)(.yo’ z o)= 

teLDAT2 y ,Q{t)-q t€DAT2 y ,Q(t) = q 

2 (?) ( a; o)= F ( u )(yo’ z o)= Y. a( u ) F ( u )(yo, z o)- 

u£LDAT2 z ,g(u) = q u^DAT2 z ,g(u) = q 

The integer coefficients a(t) and a(u) indicate the number of possible monotonic 
labellings of a tree. □ 
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Derivatives of the Numerical Solution 


For the problem (5.1) with consistent inital values (y 0 ,z 0 ) we write one step of a 
Runge-Kutta method in the form 


where 

and 


Vl =yo + J2 h , k i> Z l= Z 0 + Y^ b i £ i 
1=1 1=1 

ki = hf(Y i ,Z i ), 0 = g(Y l ) 


(5.10a) 

(5.10b) 

(5.10c) 


Y i = y 0 + Ys a n kj, Zi = z 0 + J2 ■ 

3 = 1 3 = 1 

We have replaced hk ni , ht ni of Formula (4.1) by k i , t { . This is not essential, but 
adjusts the derivation of the order conditions to the presentation of Sect. VI.4. Since 
the following derivation is very similar to the one given in Sect. VI.4, we restrict 
ourselves to the main ideas. 

We consider y 1 ,z 1 , k^t^ F-, Z i as functions of h and compute their deriva¬ 
tives at h = 0. From (5.10a) we get 


y\ q \ o) = EM^(°), 


and (5.10b) yields 


ti q \o) = q (f(Y i ,z i )) 


(*-l) 


i=l 


h =0 


o 


= (m) 


(q) 


h =0 


(5.11) 


(5.12) 


The total derivatives of f(Y^ Z { ) and g{Y i ) can be computed by Faa di Bruno’s 
formula (see (VI.4.14) and (VI.4.15)). This gives 

(/(y„ zS ) (?_1) = £ dm ^ Y d ^ l) (y^\y^\z^\..., z\^) 

(5.13) 

with n 1 + ..: + n m + v x +... + v n = q - 1, and 

I (?) _ dm 9( Y i ) ( v (n) 
dy r ‘ 


(.«>)“-£■ 




(5.14) 


with /ij +... + fj, m = q. The summations in (5.13) and (5.14) are over sets of 
suitable “special labelled trees”. We next insert 


3 = 1 

into (5.13) and (5.14) and so obtain from (5.12) 

d m+n f{y 0 , z o) 


(/0 

3 


(5.15) 




dy m dz n 


53 C°).^(O), - - -) (5.16) 

3 = 1 
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°=«,<#.) E ««*“(»>+E ^7 0).-)- < 5 ' 17 » 

j=l m>2 U 'j= 1 7 

Inserting (5.16) into the first term of (5.17) and extracting Z ■ g-1 ^ (0) we get 


{-9yfz)(y 0 ,Zo)J2 a ij Z( } 9 J) (°) 


=e«* e 0 

J = 1 (m,n)^(0,l) y M=1 ' 

H m >2 ^ V j = l 7 

This formula allows us to compute z \ q ~ 1 ^, whenever (g y f z ) and (a-) are inver¬ 
tible. We denote the coefficients of the inverse of (a- ) by to-- , i.e., 

K-) = K) _1 - ( 5 - 19 ) 

The following result then follows by induction on q from (5.16) and (5.18). 
Theorem 5.6 (HLR89). The derivatives of k i and Z i satisfy 

fc i ?> (°)= Y 'f( t )$i( t ) F ( t )(yo’ z o) 

t(zLDAT2 y ,Q{t) — q 

z i 9 ) i Q )= Y i( u )$i( u ) F ( u )(y 0 , z o)> 

u£LDAT2 z ,Q(u) = q 


where the coefficients Qft) and <I>-(u) are given by $-(r) = 1 and 


*.■(<)= E 


a ifi 1 * ' ' a ifi m ’ ^1 


(*l) • • • $u m (0 $ iK) • • • $ i( M n) 




•••a- ) 

Z\ / / v J^m l 1 ! ' 1' /^ra ' 

if u = [t 1 ,...,t m } z 

and the rational coefficients 7 (f) and 7 ( 14 ) are defined by 7 ( 7 -) = 1 and 
7W = f?(*b(M • • • 7(*mb(«i) • • • 7(0 if t = O • • • ,0 « 1 , • • •, 

7(«) = g ^j +1 7(*i) • --7(0 f u = O • • • ,Oz- 
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The derivatives of the numerical solution y 1 are now obtained from (5.11). In 
order to get those of z x , we compute t { from (5.10c) and insert it into (5.10a). This 
yields 

s 

z i= z o+Y b i w n( z i ~ z °) ( 5 - 2 °) 

i,j =1 

and its derivatives are given by 

^ 9 ) ( 0 )= Y b i u a z i q \°)- ( 5 . 21 ) 

i,j= 1 

We thus obtain the following result. 

Theorem 5.7. The numerical solution of (5.10) satisfies 

s 

yi q) \h=o= Y '7W^ 6 .- $ iW- F (<)(y 0 > 2 o)> 

t£LDAT2 y ,Q(t) = q i= 1 

s 

A q) \h=o= Y 7 ( u )Y h i w ifiA u ) F ( u )(yoi z o)' 

u£LDAT2 z ,Q(u) = q hj= 1 

where the coefficients 7 and <E* z are given in Theorem 5.6. □ 


Order Conditions 


A comparison of Theorem 5.7 with Theorem 5.5 gives 


Theorem 5.8 (HLR89). For the Runge-Kutta method (5.10) we have 
y(x 0 +h) — y 1 — 0(h p+1 ) iff 

v—A 1 

Y = yy^y f or f e DAT2 y> 0W < p> 

z(x 0 + h)-z 1 = 0(h q+1 ) iff 


Y b i u ij®j( u ) = f° r u e DAT2 z , q(u) < q , 


where the coefficients 7 and are those of Theorem 5.6 and is given by 
(5.19). □ 


Remark 5.9. Let P(x 0 ) = I - {f z (g y f z )~ l g y ){y 0l z 0 ) be the projection intro¬ 
duced in Definition 4.3. Since P{x 0 )f z (y 0 , z 0 ) = 0 we have 

P(x o )F(t)(y o ,z o )=0 


(5.22) 
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for all trees t G DAT2 y of the form t — [u] y with u G DAT2 Z . Consequently, 
such trees of order p need not be considered for the construction of Runge-Kutta 
methods of order p (see Theorem 4.5). 

Applying repeatedly the definition of $ • in Theorem 5.6 we get the following 
algorithm: 

Forming the Order Condition for a Given Tree. Attach to each vertex one summa¬ 
tion index; if the root is fat, attach three indices to this root. Then the left hand side 
of the order condition is a sum over all indices of a product with factors 

b • if “z” is the index of a meagre root; 

if “z, j, fc” are the three indices of a fat root; 
a ■ • if “j” lies directly above “z” and “j” is meagre; 

uo- if “ j ” lies directly above “z” and “j” is fat. 

In Table 5.1 we collect the order conditions for some trees of DAT2. We 
have not included the trees which have only meagre vertices, because their order 
condition is exactly the same as that of Sect. II.2 (Table 2.2). Several trees of DAT2 
lead to the same order condition (Exercise 2). We also observe that some of the 
order conditions for the trees [u] with u G DAT2 Z are identical to those for index 
1 problems (see Exercise 1 of Sect. VI.4). 


Table 5.1. Trees and order conditions 


e(t) 

graph 

order condition 

2 

Y 

j — 1 

3 

Yv 

biUJijCj — 1 

3 

Y 

X] bilOijCjCtj k C k = 2 

3 

-y 

biCiLOijCj — Tj- 

3 

V 

Ys h iVij c jVikC k =z 

q(u) 

graph 

order condition 

1 

V 

Y J b i u>ijUj k c 2 k = 2 

2 

Y/^ 

'ffjbiUJijLOjk c k 3 

2 

jk c k a k£ c £ — 2 

2 

Y 

^2/biLOijCjLOj k C k = 2 
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Simplifying Assumptions 

For the construction of implicit Runge-Kutta methods the simplifying conditions 
B(p ), C(rj ), D(£) of Sect. IV.5 play an important role. The following result ex¬ 
tends Theorem IV.5.1 to index 2 problems. 

Theorem 5.10 (HLR89, p. 67). Suppose that the Runge-Kutta matrix (a-) is in¬ 
vertible and that 6- = a si for i = 1 ,..., s. Then the conditions B(p), C(rj), D(£) 
with p < 2rj and p < p + £ + 1 imply that the y -component of the local error of 
(5.1) satisfies 

Vi -y{x 0 + h) =0{h p+1 ). 


Proof We just outline the main ideas; details are given in (HLR89, pp. 64-67). 
As in Sect.II.7 (Fig.II.7.1) we first simplify the order conditions with the help of 
C(r ]). This implies that trees with a branch ending with [r,..., r] (the number 
of r ’s is k — 1 ) where k < rj need no longer be considered. If we write C(rj) in 
the form 

y2 u ij ck j= kck r 1 for k=l,. ..,r?, (5.23) 

3 =1 

we observe that trees ending with [r,..., r\ z can also be reduced if the number of 
r ’s is between 1 and p . 

The simplifying condition D(f) allows us to remove trees [r,..., r, t\ y with 
t E BAT , where the number of r ’s is < £. Writing £)(£) as 

s s 

'52 b i c i u} ij = '52 b i oj ij~ kb j ck f 1 for k = i ,■■■,£, (5.24) 

*=1 i=l 

it follows that the trees [r,..., r, u\ y with u E DAT z (number of r ’s is k ) can also 
be eliminated for 1 < k < {. Since p< 2p and p < p + £ + 1 all that remains after 
these reductions are the bushy trees [r,..., r] whose order conditions are satisfied 
by B(p) , and trees of the form [u] y with u E DAT z . Because of the assumption 
6 - = a si we have 


fO if j = 

l 1 if j = s, 
and these trees can also be reduced to the bushy trees. 



(5.25) 

□ 


Remark. If the function / of (5.1a) is linear in 0 , i.e., 

f{y, z) = fo(y) + f z (y) z , (5-26) 

then the elementary differentials for trees [t 1? ..., £ m , u 1? ..., u n \ y with n > 2 
vanish identically and the corresponding order conditions need not be considered. 
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In this situation the assumption p<2rj can be relaxed to p < 2ry + 1 . An important 
class of problems satisfying (5.26) are constrained mechanical systems in the index 
2 formulation (1.46a,b,d). 

As an illustration of Theorem 5.10 we consider the Lobatto IIIC methods. They 
satisfy B(p),C(rj), D(£) with p = 2s — 2, rj = s — 1 and £ = s — 1 (see Table 
IV.5.13) and also a si = b { . It therefore follows from Theorem 5.10 that the local 
error satisfies Sy h (x) = 0 (fo 2s_1 ). 

The following result shows that for methods which do not satisfy a si = b i it is 
unlikely that the estimates of Lemma 4.4 can be improved. 

Lemma 5.11. Let p be the largest integer such that the y -component of the local 
error satisfies 

5y h (x) = 0(h* +1 ). 

If the Runge-Kutta matrix is invertible and c ■ ^ 1 for all i, then 

p<s * 

where s* is the number of distinct non-zero values among c 1 ,..., c s . 

Proof The order conditions for the trees [ [t, ..., r] J imply that 

[ q ^ dt = 

i,j= 1 

for all polynomials q(t) of degree < p— 1. Put q(t) = d'(t ), where d{t) is a 
polynomial of minimal degree such that d(c { ) — 0 for all i , d( 0 ) = 0 and d( 1 )^ 0 . 
Condition (5.27) is violated by this polynomial. The inequality p<s* now follows 
because the degree of this polynomial q(t) is 3 *. □ 


/ 


q(t)dt 


(5.27) 


Projected Runge-Kutta Methods 

It is, of course, interesting to study the convergence order of projected Runge-Kutta 
methods (Definition 4.11) which are not yet covered by Theorem 4.12. The main 
tool for the subsequent study is the following interpretation of projected Runge- 
Kutta methods. 


Table 5.2. Original and extended Runge-Kutta methods 




c 

A 

0 

c 

A 

1 + £ 

b T 

e 


b T 

b T 

e 
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Lemma 5.12 (Lubich 1991). Consider an s -stage Runge-Kutta method with invert¬ 
ible coefficient matrix A and the extended (5 + 1 )-stage method defined in Table 5.2. 
For an initial value y 0 satisfying g(y 0 ) = 0 denote their numerical solutions after 
one step by y 1 and y\, respectively. If the function f in (5.1a) is linear in z (i.e., 
(5.26) is satisfied), then the numerical solution y 1 of the projected Runge-Kutta 
method (4.1), (4.38) satisfies 


yi-yl = 0{he) (5.28) 

for h sufficiently small and & —»■ 0 . 

Proof. The last stage of the extended (s + l)-stage Runge-Kutta method reads 

^s+i = Vi + hef{Y s+ !,Z S+ j) 

o = g(Y. +1 ) 

and we have y\ = Y S ^_ 1 (note that this is the result of an implicit Euler step with 
step size he starting from y x ). Using the linearity of / with respect to z and 
putting A = heZ s ^_ 1 we obtain 

y\ = Vl +hsf 0 {y\) + f z (y\)\ 

o = s(yn- 

Comparing (5.30) with (4.38) the implicit function theorem implies that (5.28) is 
satisfied for sufficiently small h and e. □ 


The implicit function theorem, applied to (5.30), also shows that y\ is as often 
differentiable with respect to h and e as the right-hand side of the problem (5.1) is. 
Hence, the Taylor series expansion of y\ with respect to h has coefficients which 
converge to a finite limit as e -> 0 . 

The order conditions for a projected Runge-Kutta method (applied to (5.1), 
(5.26)) can thus be obtained by considering the limit e —>• 0 in the order condi¬ 
tions for the extended Runge-Kutta method (Exercise 5). Let us illustrate this by 
extending the statement of Theorem 5.10 to projected Runge-Kutta methods. 

Theorem 5.13 (Lubich 1991). Suppose that the Runge-Kutta matrix A is invertible 
and that the index 2 problem satisfies (5.26). Then the conditions B(p), C(rj), 
D(£) with p < 2rj + 1 and p < rj + f + 1 imply that the local error of the projected 
Runge-Kutta method satisfies 

V\ ~y{ x o + h ) = 0{h p+1 ). (5.31) 

If in addition p < 2rj then (5.31) holds also when f is nonlinear in z. 

Proof. One verifies that the conditions B(p), C(rj), D(£), (5.23), (5.24) and 
(5.25) are, in the limit e —>■ 0, also satisfied for the extended method of Table 
5.2. Let us demonstrate this for the Condition (5.23). The inverse of the extended 
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Runge-Kutta matrix is given by 


A (A 1 

bT e ) 


A - 1 0 

—£- 1 b T A~ 1 £-1 


(5.32) 


Therefore (5.23) is seen to be satisfied for z = 1,..., s. For i = s + 1 one gets 

Z b t u} ij ck j+ £ ~ 1 ( 1 + £ ) k - ( 5 - 33 ) 

j= 1 i,j= 1 

Using (5.23) for i < 6 and B(p) the right-hand expression of (5.33) becomes 
— £ _1 + £ _1 (1 + e) k and tends to k for e 0. Hence, Condition (5.23) is, in 
the limit e —» 0, also satisfied for i = 6 + 1. As in the proof of Theorem 5.10 (see 
also the remark after that proof) we deduce the statement for the case where f(y,z) 
is linear in z. 

The generalization to nonlinear problems can be proved by a perturbation argu¬ 
ment. We let z(x) be the exact solution of (5.1) and consider the problem (Lubich 
1991) 

u'= f{u,z{x)) + f 2 (u,z(x))\ 

0=flT(u) 

in the variables u and A. This new problem is of index 2 again and has obviously 
the solution u(x) — y(x) and X(x) = 0. Since (5.34) is linear in the algebraic 
variable A, the theorem can be applied and we get for the projected Runge-Kutta 
solution 

u 1 -y{x 0 + h) = O{h? +1 ). (5.35) 

We still have to estimate y 1 —u 1 . This is possible with the help of Theorem 4.2. 
In addition to the nonlinear system (4.2) (with rj = y 0 ) we consider the method 
applied to (5.34): 


Ui = Vo + h a ij z ( x ° + c i h )) + z ( x o + c i h )) A i) 

j— i (5.3o) 

0 = g(Ui). 

Its first line can be written as 

s 

Ui =y 0 +h S ^a i jf{U+Cjh) + h-) + 0(h\\ A|| 2 ) 
j = 1 

where ||A|| = max. ||A ; ||. Theorem 4.2 thus yields 


||^-^||<C/z||A || 2 (5.37a) 

IIA j + z(x 0 + c-/i) — Z-|| < C||A|| 2 . (5.37b) 

Since C(rj) holds, the estimate (4.14) together with (5.37b) proves — 0(h v ). 
We therefore obtain y 1 —u 1 = (9(/i 27 ? +1 ) with the help of (5.37), and y 1 —u 1 = 
0 (fo 27 ? +1 ) as a consequence of z x — z(x 0 + h) = 0(h v ). □ 



518 VII. Differential-Algebraic Equations of Higher Index 


Examples. 1) Collocation methods satisfy B(p), C(s) and D(p — s) where 5 is 
the number of stages and p the order of the underlying quadrature formula (consult 
Lemma IV.5.4). Hence, the above presentation provides an alternative proof of 
Theorem 4.12. 

2) The projected 5 -stage Radau IA method (see Table IV.5.13) has order 25 — 1 
for problems which are linear in 2 , and order 25 — 2 for general nonlinear index 2 
problems. 


Exercises 


1. Denote by r the largest number such that the local error of the z -component 
satisfies Sz h (x) = 0(h r ). For implicit Runge-Kutta methods with invertible 
coefficient matrix, R( 00) =0 and c i ^ 1 (all j ) prove that 

r <5* 


where 5* is the number of distinct non-zero values among c 1 ,..., c 3 . 
Hint. The order conditions for the bushy trees [r,..., r] z imply that 


L hi^jk 

hj,k 



q(t)dt = q( 1) 


for all polynomials q(t ) of degree < r — 1 . 


2. If a tree of DAT 2 satisfies one of the following two conditions 

a) a fat vertex (different from the root) is singly branched 

b) a singly branched meagre vertex (^ root) is followed by a fat vertex 

then the corresponding order condition is equivalent to that of a tree of the 
same order but with fewer fat vertices. Consequently, trees satisfying either (a) 
or (b) need not be considered for the construction of Runge-Kutta methods. 

3. Suppose that the function f(y,z) in (5.1) is linear in z. Characterize the trees 
of DAT2 for which the elementary differentials vanish identically. 


4. With the help of Theorem 5.10 and Lemma IV.5.4 give a new (algebraic) proof 
of Theorem 4.9. 


5. (Lubich 1991). Consider a projected Runge-Kutta method for index 2 prob¬ 
lems which are linear in z. Prove that y x — y(x 0 + h) = 0(h A ) iff the condition 

i,j =1 

is satisfied in addition to the four order conditions already needed for ordinary 
differential equations. 



VII.6 Half-Explicit Methods for Index 2 Systems 


The methods of Sects. VII.3 and VII.4 do not use the semi-explicit structure of the 
differential-algebraic equation 

y' = f(y, z ), 0 = g(y) (6.1) 

(ye R n , z e R m ) and can as well be applied to more general situations. Here we 
shall show how this structure can be exploited for the derivation of new, efficient 
integration methods. The main idea is to discretize the differential variables y in 
an explicit manner, and the algebraic variables 0 in an implicit manner. 

The most simple method of this type is the half-explicit Euler method 

Vi =y 0 + h f(yo’ z o) 
o = g(y 1 )- 

Inserting (6.2a) into (6.2b) yields the nonlinear system 0 = g(y 0 + hf(y 0 , 
z o . It possesses a locally unique solution, if 

9 y (y)f z {y, z ) is invertible (6.3) 

at (y 0 ,z 0 ) • Once z 0 is computed, the value y 1 is determined explicitly by (6.2a). 

This example shows some interesting features of half-explicit methods. Com¬ 
pared to the implicit Euler discretization, it can be implemented more efficiently, 
because the nonlinear system is of reduced dimension (m instead of n -fra). Com¬ 
pared to the explicit Euler method in the mode “index reduction and projection” 
(see Sect. VII.2), it avoids an accurate computation of the derivative g y (y). The 
numerical approximation y 1 only depends on an initial value of the y -component, 
as does the exact solution of (6.1). 

In this section we shall develop half-explicit Runge-Kutta methods, extrapo¬ 
lation methods, and multistep methods. They are in particular very efficient for 
constrained mechanical systems in their index 2 formulation, because nonlinear 
systems are completely avoided in this situation (see below). 


(6.2a) 

(6.2b) 

z 0 )) for 
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Half-Explicit Runge-Kutta Methods 


In HLR89, the following extension of (6.2) to explicit Runge-Kutta methods is 
proposed: 

i — 1 

Y i = Vo+ h Yl a ijf( Y j’ Z j)’ i = (6.4a) 

3 = 1 

0 = g(Yi) (6.4b) 

s 

2/i=% + ^EW^)’ ( 6 - 4c > 

i— 1 

0 = g( yi ). (6.4d) 

We have Y 1 = y 0 , and Eq. (6.4b) is automatically satisfied for i = 1, because the 
initial value is assumed to be consistent. We next insert Y 2 from (6.4a) into (6.4b) 
and obtain a nonlinear equation for Z 1 , which has a (locally) unique solution, if 
a 21 ^ 0 and the usual index 2 assumption (6.3) is satisfied. We thus obtain Z 1 and 
Y 2 . The next step allows us to compute Z 2 and Y 3 , etc. 

The local error and convergence properties of (6.4) are studied in HLR89 and 
Brasey & Hairer (1993). It turns out that the coefficients a i j,b i have to satisfy 
additional order conditions. As a consequence, 8 stages are needed for a 5 th or¬ 
der method (Brasey 1992), compared to only 6 stages for classical Runge-Kutta 
methods (see Sect. II.5). Arnold (1995) and Murua (1995) have independently pro¬ 
posed a modification, which simplifies the order conditions and makes the approach 
more efficient. Their main idea is to introduce an explicit stage Y 1 =y 0 , Z 1 = z 0 , 
Y 2 = y 0 -j- ha 21 f(y 0 ,z 0 ), and to suppress the condition g(Y 2 ) = 0 in the second 
stage. We follow here the approach of Murua (1995), because it is slightly more 
general. For consistent initial values (y 0 ,z 0 ) we define 

Y i = ^ 0 ’ Z i~ z o (6.5a) 

i—i 

Yi = y 0 + hY a Y( Y J ’ Z i)’ i = 2,...,s (6.5b) 

3 = 1 
i 

Yi = yo + h Y°‘ijf( Y v Z i^ ° = 9(Yi), i = 2,...,s (6.5c) 

3 = 1 

Vi=Y t . (6.5d) 

The value z x can either be computed from the hidden constraint g y {yi)f{yi^ 1 ) = 
0 , or from the additional stage 

3+1 

n+i = yo + ^E«a+i lJ /++ J ), 0 = g(Y, +1 ) (6.5e) 

3= 1 

as z± — Z s+1 . Here we have put Y s+1 = y x , so that the value /(K a+1 , Z s+1 ) 
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can be reused as f(y 0 ,z 0 ) for the next step. A significant difference compared to 
the original approach (6.4) is that the numerical solution (y 1 , z x ) depends on both 
initial values (y 0 and z 0 ). 

Existence of the Numerical Solution. Suppose that the initial values satisfy 
g(y o ) = 0 and g y (y 0 )f(y 0 ,z 0 ) = O(5) with some sufficiently small 5 > 0 (we have 
to admit small perturbations in the hidden constraint, because in general the approx¬ 
imation z 1 of ( 6 .5e) does not satisfy g y (y 1 )f(y 11 z 1 )~ 0). By an induction argu¬ 
ment we assume that the values (F-, Z-) are already known for j = 1 ,..., i — 1 , 
and satisfy Y- — y 0 + 0(h ) , Z- = z 0 + 0(h + 5 ) . Then, Y { is explicitly given 
by (6.5b), and we have Y { — y 0 + 0(h) . As in (3.13) we now write the condition 
0 = £(Y;) as 


°= [ 9y(yo+ T (Z i -yo))dT-Y j a ij f{Y J ,Z j ), (6.6) 

Jo i= i 

where Y i has to be replaced by (6.5c). This is a nonlinear equation of the form 
F(Z i ,h) = 0. Since F(z 0 , 0) = 0(5) and 

OF 

-fa i z o . o) = ■ g y (% )fz (y 0 > z 0 ), 

it follows from the Implicit Function Theorem that (6.6) has a locally unique solu¬ 
tion, if (6.3) and the condition 

a • • 0 for all i (6.7) 

hold. Moreover we have Z i = z 0 + 0(h + 5 ). 


Error Propagation and Convergence. For inconsistent initial values we replace 
the nonlinear equation in (6.5c) by g(Y i ) — g(y 0 ), so that the method is well- 
defined in a whole neighbourhood of the solution manifold (observe that the above 
existence result is still valid). Such an extension has the advantage that differen¬ 
tiation with respect to initial values is possible. The method (6.5) with z x from 
(6.5e), can thus be written as 


Vn+l =yn + h $(ym z m h ) 
Z n+1 = ^{y n ^ Z nM 


with smooth functions and T. For the study of convergence and, in particular, 
of the order conditions the triangular matrix 


/ - 1 


w = (w- V +1 


a 21 a 22 


1,1 


x s+l,2 


A s+l,s-f 1 


(6.9) 


will play an important role. 
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Lemma 6.1. Suppose that the method (6.5), satisfying (6.7), is written in the form 
(6.8). If g(y 0 ) = 0 and g y {y 0 )f{y 0 ,z 0 ) = 0(h), it holds 

3$ d $ 

-fo(yo,z 0 ,h) = 0(h), -Q^(y 0 ^Q,h) = w 3+1A -I + O(h), 

where w s+11 is given by (6.9). 

Proof. From (6.5b) it follows that BY i /dz 0 — 0(h). Differentiation of (6.5c) with 
respect to z 0 thus yields 


^= h J2d ij f z (y 0 , z 0 )^-l- + O(h 2 ), 

UZ 0 j =1 uz o 

BY 

O = 9y(yo)of o +O(h 2 ). 


( 6 . 10 a) 

( 6 . 10 b) 


Inserting (6.10a) into (6.10b) and multiplying with the inverse of the matrix 
9y ( y 0 )fz (Vo > 2 o). gives the relation 


3 = 1 


r az, 

' ij dz n 


= 0(h) 


The statement now follows from Z x 


’ 


for i — 2 ,..., 5 + 1 . 
i.e., dZ x /dz 0 — I. 


□ 


Consider two pairs of initial values (y 0 ,z 0 ), (y 0 ,z 0 ), satisfying g(y 0 ) = 0, 
g(y 0 ) = 0, g y (y 0 )f(y 0 i z o) = °( h )> 9 y (y 0 )f(yo^o) = 0{h ). it follows from 
Lemma 6.1 that the differences A y 0 = y 0 — y 0 , ... satisfy the recursion 


(\\A yi \\\ (l + 0(h) 0(W) \ f ||Aj/ 0 || 

V/ — V °(!) K+I.il + W; VII A2 oll 


( 6 . 11 ) 


The local error of the method (6.5) is defined as usual. We let (y 1 ,z 1 ) be the 
numerical approximation for initial values (y(x), z(x)) on the exact solution of 
( 6 . 1 ), and denote it by Sy h (x) = y 1 — y{x + h), 6z h (x ) = z 1 — z(x + h). 


Theorem 6.2 (Murua 1995). Consider the problem (6.1) with consistent initial 
values. Suppose that (6.7) holds and that 

K+i,il < (6.12) 

where w sJrl x is given in (6.9). If the local error satisfies 

tyh( x ) = 0(h r+1 ), 8z h (x) =0(h m ), (6.13) 

then we have for x n — x 0 < Const 

y n - y(x n ) = 0(h min ^ m+ ^), z n - z(x n ) = 0(h mi ^). 



VTI.6 Half-Explicit Methods for Index 2 Systems 523 


Proof. The recursion (6.11) allows us to apply Lemma VI.3.9 with e = h 2 and 
a = |iu a+1 x | + 0(h) . This shows that the contribution of the local error at x • to 
the global error at x n is bounded by 

C(\\Sy k (x t )\\+h 2 \\8z h (xM), 

for the y - and £ -component, respectively. Summing up these contributions proves 
the statement. □ 


Order Conditions. The order conditions for method (6.5) can be derived in the 
same way as for Runge-Kutta methods (previous section). The only difference is 
that at some places the coefficients a- have to be replaced by a -. Since z x = 
Z s+1 , the order conditions for the £-component can be directly obtained from 
Theorem 5.6. The result is the following: 

Forming the Order Condition for a Given Tree. Attach to each vertex one summa¬ 
tion index. Then the left-hand side of the order condition is a sum over all indices 
of a product with factors 

a si if “z” is the index of a meagre root; 

w s+i i if is the index of a fat root; 

a- if the meagre vertex “j” lies directly above the meagre vertex “z”; 

a • ■ if the meagre vertex “j” lies directly above the fat vertex “z”; 

if the fat vertex “j” lies directly above the meagre vertex “z”; 

The right-hand side of the order condition is the inverse of the rational number 7 , 
defined in Theorem 5.6. 

In order to satisfy the assumption (6.13) of the convergence theorem, the order 
conditions have to be satisfied for trees t E DAT2 y with g(t) < r, and for trees 
u E DAT2 Z with g(u) < m — 1. 

Construction of Methods. The trees of Sect. II.2 form a subset of the “index 
2 trees” to be considered here. From the above construction principle it is clear 
that the coefficients a i -,b i := a si have to satisfy the classical order conditions of 
Sect. II.2. It is therefore natural to take a known, explicit Runge-Kutta method of 
a certain order and to determine a- in such a way that the remaining order condi¬ 
tions are satisfied. Arnold (1995) and Murua (1995) have shown how half-explicit 
methods, based on the Dormand & Prince pair of Table II.5.2, can be constructed. 
Let us outline the main idea. 

A significant simplification of the order conditions is obtained by requiring 

i 

for * = !,•••,5 + 1, (6.14) 

i =1 q 

where c- = a ij an d c- = a { -. For z = 1, the relation (6.14) is automatically 

fulfilled because of a 1 j =0. For z > 1, it can be satisfied for q — 1 (definition 
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of c •), q — 2, and q = 3. The simplification in the order conditions is similar to 
that illustrated in Fig.II.5.2. By the definition of the matrix W, the relations of 
Eq. (6.14) are equivalent to 

i 

Yl w ifi q j =qc i ~ 1 for * = l,...,s + l. (6.15) 

3 = 1 

This implies further reductions in the set of order conditions. The few remaining 
ones can be treated in a straight-forward manner. For further details and for the 
coefficients of the resulting method we refer to the original article of Murua (1995). 
They have been incorporated in the code PHEM56 (see Sect. VII.7). 


Application to Constrained Mechanical Systems. Consider the system 

q’— u (6.16a) 

M(q)u' = f(q,u)-G T (q)\ (6.16b) 

0 = g(q), (6.16c) 


where G(q) = g q {q ). Differentiating the constraint (6.16c) yields 

0 = G{q)u. (6.16d) 


If M(q) is invertible, the system (6.16a,b,d) is of the form (6.1) with y = (q, it) 
and z = X. The assumption (6.3) is equivalent to (1.47). 

For this particular system the method (6.5) can be applied as follows: assume 
that Qj,UAj , and C/j = M(Q j )- 1 - G T (Q j )A j ) are already given 

for j = 1,..., i — 1. We then put 


Qi = % + h Y, a ij U i ’ 

3 = 1 


(7,. 


2—1 

:u o+hy2 




j=l 


and compute A •, U[ from the system 


(M(Q t ) G r {Q i )\(U!\ = (f(Q i ,U i ) 

V G(Qi) 0 jyAj V Ri 


(6.17) 


where Q { = q 0 + h J2 l j=zl a tj U j and R t = -G{Qi)(u 0 + h X)}=i 
are known quantities. Hence, only linear systems of type (6.17) have to be solved. 
This makes half-explicit methods very attractive for the numerical solution of con¬ 
strained mechanical systems. If necessary, this method can be combined with pro¬ 
jections as explained in Sect. VII.2, so that also the position constraint is satisfied 
by the numerical approximation. 

We remark that the methods proposed by Arnold (1995) satisfy Q • = Q- +1 for 
i > 2, so that some G evaluations can be saved in the computation of (6.17). 
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Extrapolation Methods 


For nonstiff ordinary differential equations, the most efficient extrapolation algo¬ 
rithm is the GBS method (see Sect.II.9). Lubich (1989) extends this method to 
differential-algebraic equations of index 2. 

Consider an initial value y 0 satisfying g(y 0 ) = 0. Then, an approximation 
S h (x) to y(x) (with x = x 0 + 2 mh) is defined by 

Vi =yo+ h f(y<>i z o)’ diVi ) = 0 (6.18a) 

y i +i=yi-i+2hf(y i ,z i ), g(y i+1 ) = 0, i = l,...,2m (6.18b) 

Sh( x ) = (%m-1 + 2?/2m + V2m+1) A- (6.18c) 

The starting step is identical to the half-explicit Euler method, considered at the 
beginning of this section. It is implicit in z 0 and explicit in y x . For the case that 
Eq. (6.1) in linear in z, i.e., 

f(y,z) = f 0 {y) +f z {y)z, (6.19) 

we shall show below that the numerical approximations S h (x 0 + 2 mh) and z 2m 
possess an h 2 -expansion. Hence, these values can be used as the basis of an extrap¬ 
olation method. The implementation is completely analogue to that for the GBS 
method (choice of the step number sequence, order and step size control, dense 
output, ...). Since the extrapolated values do not satisfy the constraint g(y) = 0, 
it is recommended to project them onto this manifold (as explained in Sect. VII.2) 
after every accepted step. 

The assumption (6.19) is satisfied for many interesting problems, e.g., for the 
constrained mechanical system (6.16a,b,d), where z = A plays the role of a La¬ 
grange multiplier. 


Theorem 6.3 (Lubich 1989). Under the assumptions (6.3) and (6.19) the numerical 
solution of method (6.18) possesses an asymptotic h 2 -expansion 

V2m - y( x 2m ) = + • • • + a 2N( x 2m) h2N + 0(h 2N+2 ) 

z 2m - z i x 2m) = h( x 2m) h2 + h( x 2m) h4 + • • ■ + + 0{h 2N+2 ) 

and another h 2 -expansion for the error of S h (x 2m ). 


The numerical solution {y ■} of method (6.18) lies on the manifold defined by 
g(y) = 0. In order to be able to apply the results and ideas of Sects. II.8 and II.9, 
we extend the method (6.18) to arbitrary initial values as follows: 


Vi = y 0 + hf{y 0 ,z 0 ), g{yi) = g(y 0 ) (6.20a) 

y i+1 =2/<_i +2hf(y i ,z i ), giVi+i) = g(Vi-i), i = l,...,2m (6.20b) 
We further eliminiate the 2: - variables: using the identity 


g(y i+ i) - g(yi-i) 


j:a 


yj+i + Vi-1 . yj+i-Vi-i 


L ) da \ 


Vi+i-yj-i 


)• 


2 


2 
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Eq. (6.20b) becomes 

o = y i g y ( y,+1 - 2 +< Th f(yi> z ij) dcr -f{yi^ z i)- (6-21) 

By assumption (6.3) and the Implicit Function Theorem, Eq. (6.21) can be solved 
for z { as a smooth function of (y- +1 + y*_ 1 )/2, y { , and h . Inserted into (6.20b) 
we obtain a recursion of the type 

Vi+ 1 = Vi -1 +2M(yi,(y i+1 +y i _ 1 )/2,h). (6.22) 

The starting step (6.20a) can be rewritten in a similar way. We consider the more 
general system 

w = v + hf(u,z), g(w)=g(v), (6.23) 

where u, v , and h are given. It can be written in the equivalent form 

°= f 9y{v + rhf(u,z)) dr- f(u,z), 

Jo 

which yields 2 as a smooth function of u,v, and h (again by the Implicit Function 
Theorem). Hence, the solution of (6.23) can be written as 

w = v + h$ 0 (uj v, /i), (6.24) 

and the starting step (6.20a) becomes 

Vi =yo + h$ Q (y Q ,y 0 ,h). (6.25) 

The crucial point of these reformulations is that the two-step method (6.22) and the 
starting step (6.25) are not only defined on the manifold g(y) = 0, but on an open 
neighbourhood of it. Therefore, the standard ODE theory can be applied. Results 
for the method (6.22), (6.25) immediately carry over to the method (6.18), because 
both methods are identical for initial values satisfying g(y 0 ) = 0. 

Asymptotic Expansion for Symmetric Two-Step Methods. Motivated by the 
above reformulations we consider the method 


2/i =2/o.+ ^o(2/o>2/o> /l ) (6.26a) 

Vi+ i =Vi-i +2/i$(j/j,(y i+1 +y i _ 1 )/2,h), (6.26b) 

where <£ 0 and are arbitrary, smooth increment functions. We assume that 
(y, y, 0) = <£(y, y, 0) = /(y), so that both methods are consistent with the ordi¬ 
nary differential equation y' = f(y). In order to get an h 2 -expansion of the error, 
the starting step (6.26a) has to be compatible with (6.26b) in the following sense: 
for arbitrary u k , v k , the three values 


V2k-1 ~ V k- h %( U k, V k>- h ), 
V2k+1 '■= v k+ h ®o( u k, v k, h ) 


satisfy the recursion (6.26b). 


2/2 k : = u k 


(6.27) 



VII.6 Half-Explicit Methods for Index 2 Systems 527 


Theorem 6.4. If the method (6.26) satisfies the compatibility condition (6.27), the 
numerical approximations 

have an asymptotic expansion in even powers of h. 

Proof. Inspired by Stetter’s proof of Theorem II.9.2 we put u k :=y 2k , and let v k 
be the solution of 

V 2 k +1 : = v k + h $o( u k’ v k, h )' (6-28) 

We thus get the one-step method in doubled dimension 

( $(j/2fc+i,K+i+ M ifc)/2,ft72) 

V \ v k > h *l 2 ) + %(u k+1 ,v k+1 ,-h*/2)) 

where h* — 2ft, and y 2 fc+i 1S given by (6.28). The assumption (6.27) implies 
that this one-step method is symmetric. Therefore, y 2m = u m and v m have an 
asymptotic h 2 -expansion (see Theorem II.8.10). From 

(y 2 m+l + J/2m-l)/2 = y 2m + h ) ~ 

it follows that the same is true for (y 2m+1 +^ 2 m-i)/^ □ 



Proof of Theorem 6.3. We have already seen that the method (6.20) can be written 
in the form (6.26). All that remains to do is to check the compatibility condition 
(6.27). By definition of $ 0 (u, v, h) (see the equivalence of Eqs. (6.23) and (6.25)) 
we have 


y2k~i= v k- h f( u k, z ). s(fe-i) = sK) 
y 2 *+i = v k + h f( u k, z+ ), 9{y 2 k+ 1) = y( v k)- 


Since / is linear in 2 , this implies (6.20b) with z 2k = (z~ +z+)/2. The asymp¬ 
totic h 2 -expansion of y 2m and S h (x 2m ) thus follows from Theorem 6.4. From 
(6.21) we then see that also z 2m has an h 2 -expansion. □ 


(3 -Blocked Multistep Methods 

The convergence analysis of Sect. VII.3 shows that all roots of the a -polynomial 
of a multistep method must lie inside the unit disc in order to get a convergent 
method of order p. This is a severe restriction and excludes, for example, all 
explicit and implicit Adams methods. Arevalo, Fiihrer & Soderlind (1995) suggest 
a modification which allows the use of “nonstiff” multistep methods. The idea is 
to treat different parts of the problem by different discretizations. 

For the index 2 problem 

y' = fo(y) + fz(y) z > 


0 = g(y), 


(6.29) 
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where /(y, z) = f 0 (y) + f z (y) z depends linearly on z , we consider the discretiza¬ 
tion 

k k k 

^ ®iVn+i ^ ^ v PifiVn+ji ^n-¥i) ^fz^Vn+k) Tj Z n-\-i ’ (6.30) 

2—0 2 = 0 2 = 0 

with g(y n+k ) = 0, and denote the generating polynomials by 

k k k 

2 (C) = X>C, ^(0 = X>C\ r(0 = ^ 7i 0. 

2 = 0 2=0 2=0 

Theorem 6.5 (Arevalo, Fuhrer & Soderlind 1996). Let the index 2 problem (6.29) 
satisfy (6.3). Assume that the multistep method (q,ct) is stable and of order p 
(p = k or p = k + l), that r(() = j k (( — l) k , and that all roots of a((f) — r ((') lie 
inside the unit disc |£| < 1. Then the global error satisfies for x n — x 0 < Const 

Vn - y{x n ) = 0{h p ), z n - z(x n ) = 0(h k ). 

Proof. The special form of r(£) is equivalent to 

k 

'^2'ri z (x n +ih) = 0(h k ), 

2=0 

so that the newly added term in (6.30) is small. Moreover, this term is premultiplied 
by f z (y n +k) ’ so ^ at l° ca l error satisfies 

5 Vh( x ) = 0(h k+1 ), P(x)5y h (x) = 0(h p+1 ), 

where P(x) is the projector of Definition 4.3. 

With these observations in mind, the convergence result is obtained along the 
lines of the proof of Theorem 3.6. The only difference is that the coefficients 
have to be replaced by — r ) i in Eqs. (3.43) and (3.44). □ 


In principle, one can take any convergent multistep method (g,a) of order 
p—k or p = A; +1, and try to optimize the parameter j k in r (£) in such a way that 
the roots of a(() — r(() become small. The result, for the implicit Adams methods, 
is rather disappointing. Only for k < 3 it is possible to obtain convergent (3 -blocked 
Adams methods (Arevalo, Fuhrer & Soderlind (1996), see also Exercise 3). 

Difference Corrected BDF. Consider the (k + l)-step BDF method, defined in 
Eq. (III. 1.22’), and replace V fc+1 y n+1 by V fc / n+1 . This leads to the so-called 
difference corrected BDF 

k 

£ ) v 7 „ +1 = />(/„+, - jij vv„ + i), 


(6.31) 
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introduced by Soderlind (1989). Method (6.31) is a A;-step method of order p = 
k + 1. Its £-polynomial is identical to that of the BDF method and a(Q — ( k — 
(( — l) k /(k + 1). With r(() = -(( — 1 ) k /(k + 1) the difference cr(() — r(() has 
all roots equal to zero. This is therefore an ideal candidate for a method of type 
(6.30). 

Exercises 

1. Construct all half-explicit methods (6.5) of order 3 (r = m = 3 in Eq. (6.13)) 
with 5 = 3 stages. You can take c 2 , c 3 , a, c 2 , c 4 as free parameters. 

Hint. Start with a classical Runge-Kutta method of order 3 (Exercise 4 of 
Sect. II. 1), and show that the order conditions imply (6.14) for q = 2. 

2. Show that the method (IV.9.15) of Bader & Deuflhard (1983) is of the form 
(6.26) with 

4>(i£, u, h) = f(u) — Jw + Jv 
$ 0 (u, V, h) = (I — hJ)- 1 ( f{u) -Ju + Jv). 

Check the assumption (6.27). 

3. Let (g k , a k ) be the generating polynomials of the k -step implicit Adams meth¬ 
ods (Sect. III. 1). For fc = l,2,...,10 study numerically the function 

R k (j) :=max{|C*| ; C* is root of a k (Q - 7 (( - l) fc = 0}. 

For which values of k is it possible to find 7 with R k (^f) < 1 ? 
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Dynamics of multibody systems is of great importance in the 
fields of robotics, biomechanics, spacecraft control, road and rail 
vehicle design, and dynamics of machinery. 

(W. Schiehlen 1990) 


After having seen several different approaches for the numerical solution of con¬ 
strained mechanical systems, we are interested in their efficiency when applied 
to a concrete situation. We consider two particular multibody mechanisms with 
constraints, one nonstiff and one stiff. General references for the computation of 
mechanical systems are Haug (1989) and Roberson & Schwertassek (1988). 

Description of the Model 

We first consider “Andrews’ squeezer mechanism”, which has become prominent 
through the work of Giles (1978) and Manning (1981), who promoted it as a test 
example for numerical codes; see also Ormrod & Andrews (1986). It consists of 7 
rigid bodies connected by joints without friction in plane motion. It is represented 
in Fig. 7.1, which we have copied (with permission) from the book of Schiehlen 
(1990). The numerical constants, also taken from Schiehlen (1990), are displayed 
in Tables 7.1 and 7.2. The arrows in the right picture of Fig. 7.1 indicate the posi¬ 
tions of the centres of gravity C x ,..., C 7 . In Table 7.1 the spring coefficient of the 
spring connecting the point D with C is denoted by c 0 and the unstretched length 
is £ 0 . We suppose that the mechanism is driven by a motor, located at O , whose 
constant drive torque is given by mom — 0.033. The coordinate origin is the point 
O in Fig. 7.1 and the coordinates of the other fixed points A , B and C are given 
by 

—0.06934 \ (xb\ _( —0.03635 \ ( xc\ _( 0.014 \ 

-0.00227) ’ \yb)~ V 0.03273 )' (yc/ (0.072 ) ' 



Table 7.1. Geometrical parameters 


d = 0.028 
ea = 0.01421 
rr = 0.007 
sa = 0.01874 
sd = 0.02 
tb = 0.00916 
ub = 0.00449 


da — 0.0115 
zf= 0.02 
ra — 0.00092 
sb = 0.01043 
zt = 0.04 
u = 0.04 
eg — 4530 


e = 0.02 
fa = 0.01421 
ss = 0.035 
sc = 0.018 
ta = 0.02308 
ua = 0.01228 
4 = 0.07785 
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Table 7.2. Parameters of the 7 bodies 


No. 

masses m\ to mj 

inertias I\ to Ij 

1 

0.04325 

2.194-10 -6 

2 

0.00365 

4.410-10 -7 

3 

0.02373 

5.255 • 10 -6 

4 

0.00706 

5.667-10 -7 

5 

0.07050 

1.169-10 -5 

6 

0.00706 

5.667 • 10 -7 

7 

0.05498 

1.912 • 10“ 5 


In order to derive the equations of motion we use the angles (see Fig. 7.1) 


9i=/?, 9 2 = 0 > «3=7> 94=$. %=&, %=&, q 7 =e, (7.2) 

as position coordinates for the mechanical system. If are the cartesian 

coordinates of the centre of gravity Cj (j = 1,..., 7), the kinetic energy of the 
multibody system is 


3 =1 


ij+Vj 


2 7 w 2 


j= 1 


(7.3) 


where uo- is the total angle of rotation of the j th body and m -, I- are constants 
given in Table 7.2. The values of x-, y -, x 2 - + yj and Co- can be obtained in terms 
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of (7.2) by simple geometry (see Fig. 7.1): 

C 1 : x 1 — ra - cos f3 
y 1 = ra- sin j3 
il+yl = ra 2 -p 2 
v i =/? 

C 2 : x 2 = rr- cos (5 — da - cos(/? + 0) 
y 2 = rr- sin/? — da • sin {j3 + 0) 

^2 + 2/1 = ( rr2 —2 -da - rr- cos 0 + da 2 ) • /? 2 

+ 2 • (— rr- da - cos 0 + da 2 ) •/?•© + da 2 • 0 2 
uj 2 = p + 0 

C 3 : z 3 = xZ? + sa • sin 7 + sZ? • cos 7 
y 3 —yb — sa- cos 7 + sb - sin 7 
if+ 2/3 = (ia 2 +ifo 2 ) -7 2 
= 7 

C 4 : £ 4 = xa + zt - cos 8 + (e — ea) • sin(^ + (?) 

y 4 = ya + zt - sin (? — (e — ea) • cos($ + (?) 

^4 + vl ~ ( e ~ ea ) 2 • $ 2 + 2 • ((e — ea ) 2 + #• (e — ea) • sin $) • • 8 

+ {zt 2 + 2-zt- {e — ea) -sin^-b {e — ea) 2 ) - S 2 
u 4 = $ + 8 

C 5 : x 5 = xa + ta- cos 8 -tb- sin 8 
y 5 —ya-\-ta- sin 8 + tb - cos 8 
x\ + yl = (ta 2 + tb 2 ) ■ S 2 
lo 5 = 5 

C 6 : x 6 = xa + u - sin e + {zf—fa) - cos(0 + e) 

y^—ya — u- cos e + (zf—fa) - sin(JI + e) 

x 2 6 +yl = ( tf-faf • + 2 • (( zf - fa ) 2 - u ■ ( if - fa ) • sin ft ) • ft • e 

+ (( z /-/«) 2 - 2 • u ■ ( if - fa ) ■ sin ft + u 2 ) ■ e 2 

UJq = £2 £ 

C 7 : x 7 — xa + ua • sin £ — ub- cos £ 
y 7 = ya — ua - cos £ — ub - sin £ 
x 2 + y 2 = {ua 2 + ub 2 ) - i 2 
UJ 7 = £ 
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The potential energy of the system is due to the motor at the origin and to the spring 
connecting the point D with C . By Hooke’s law it is 

(i-i ) 2 

U = —mom • (3 + c 0 -—^—, (7.4) 

where i is the distance between D and C , namely 

t — \J {xd — xc ) 2 + (yd — yc ) 2 
xd = xb + sc - sin 7 + sd - cos 7 
yd — yb — sc- cos 7 + sd • sin 7. 

Finally, we have to formulate the algebraic constraints . The mechanism contains 
three loops. The first loop connects O with B via I < 1 , JC 2 , JC 3 ; the other two loops 
connect O with A , one via K 1 , K 2 , I< 4 , JC 5 , the other via K 1 , Ji 2 , JC 6 , Ii 7 . For 
each loop we get two algebraic conditions: 

rr • cos (3 — d - cos (/3 + 0) — ss • sin 7 = xb 
rr • sin /3 — d • sin(/? + ©)+,?£• cos j = yb 
rr - cos Q — d - cost3 + 0) — e • sin(4> + — zt - cos 8 — xa 

(7.5) 

rr • sin (3 — d- sin ((3 + 0) + e - cos(d> + £) — zt • sin 8 = ya 
rr - cos (3 — d - cos ((3 + 0) — zf- cos(0 + e) — u - sin e—xa 
rr - sin (3-d - sin ((3 + 0) — zf’ sin(17 + e) + u - cos e — ya. 

With the position coordinates q from (7.2) the equations (7.5) represent the con¬ 
straint g(q) = 0 where g : R 7 -7 R 6 . Together with the kinetic energy T of (7.3) 
the potential energy U of (7.4) and L — T — U — \g x — ... — A 6 g 6 the equations 
of motion (1.46) are fully determined. 


Fortran Subroutines 

For the reader’s convenience we include the essential parts of the FORTRAN sub¬ 
routines describing the differential-algebraic problem. The equations of motion are 
of the form 

M{q)q = f{q,q)-G T (q)\ (7.6a) 

0 = 5(9) (7.6b) 

where q E R 7 is the vector defined in (7.2) and A G M 6 . In the following descrip¬ 
tion the variables Q(1),...,Q(7) correspond to e (exactly as in (7.2)) and 

QP(1),...,QP(7) to their derivatives In all subroutines we have used 

the abbreviations 


SIBE = SIN (Q(l)) 
SITH = SIN (Q(2)) 
SIGA = SIN (Q(3)) 


COBE = COS (Q(l)) 
COTH = COS (Q (2)) 
COGA = COS (Q(3)) 
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SIPH = SIN (Q(4)) 
SIDE = SIN (Q(5)) 
SIOM = SIN (Q(6)) 
SIEP = SIN (Q(7)) 


COPH = COS (Q(4)) 
CODE = COS (Q(5)) 
COOM = COS (Q(6)) 
COEP = COS (Q (7)) 


SIBETH = SIN (Q(i)+Q(2)) 
SIPHDE = SIN (Q(4)+Q(5)) 
SIOMEP = SIN (Q(6)+Q(7)) 


COBETH = COS (Q(l)+Q(2)) 
COPHDE = COS (Q(4)+Q(5)) 
COOMEP = COS (q(6)+q(7)) 


bep = qp Cl) 
php = qp(4) 
OMP = qp(6) 


THP = qp(2) 
DEP = qp(5) 
EPP = qp(7) 


The remaining parameters XA,YA,... ,D,DA,E,EA,... ,Ml,II,M2,... are those 
of (7.1) and Tables 7.1 and 7.2. They usually reside in a COMMON block. The 
elements of M(q) in (7.6) are given by 


d 2 L <9 2 T 

777 • ■ = - — - 

* J dq&j dq t dq/ 

This matrix is symmetric and (due to the special arrangement of the coordinates) 
tridiagonal. The non-zero elements (on and below the diagonal) are 


M(l,l) = M1*RA**2 + M2*(RR**2-2*DA*RR*C0TH+DA**2) +11+12 

M (2,1) = M2*(DA**2-DA*RR*C0TH) + 12 

M(2,2) = M2*DA**2 + 12 

M(3,3) = M3*(SA**2+SB**2) + 13 

M(4,4) = M4*(E-EA)**2 + 14 

M(5,4) = M4*((E-EA)**2+ZT*(E-EA)*SIPH) + 14 

M(5,5) = M4*(ZT**2+2*ZT*(E-EA)*SIPH+(E-EA)**2) + M5*(TA**2+TB**2) 
+ +14 + 15 

M(6,6) = M6*(ZF-FA)**2 + 16 
M(7,6) = M6*((ZF-FA)**2-U*(ZF-FA)*SI0M) + 16 

M(7,7) = M6*((ZF-FA)**2-2*U*(ZF-FA)*SI0M+U**2) + M7*(UA**2+UB**2) 
+ +16+17 

The zth component of the function / in (7.6) is defined by 


fM^) 


d{T - U) 


d 2 (T~U) 




Written as FORTRAN statements we have 

XD = SD*C0GA + SC*SIGA + XB 
YD = SD*SIGA - SC*C0GA + YB 
LANG = SqRT ((XD-XC)**2 + (YD-YC)**2) 

FORCE = - CO * (LANG - LO)/LANG 
FX = FORCE * (XD-XC) 

FY = FORCE * (YD-YC) 

F(l) = MOM - M2*DA*RR*THP*(THP+2*BEP)*SITH 
F(2) = M2*DA*RR*BEP**2*SITH 

F(3) = FX*(SC*C0GA - SD*SIGA) + FY*(SD*C0GA + SC*SIGA) 

F(4) = M4*ZT*(E-EA)*DEP**2*C0PH 

F(5) = - M4*ZT*(E-EA)*PHP*(PHP+2*DEP)*C0PH 

F(6) = - M6*U*(ZF-FA)*EPP**2*C00M 

F(7) = M6*U*(ZF-FA)*0MP*(0MP+2*EPP)*C00M 

The algebraic constraints g(q) =0 are given by the following six equations (see 
(7.5)) 
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G(l) = RR*C0BE - D*C0BETH - SS*SIGA - XB 

G(2) = RR*SIBE - D*SIBETH + SS*C0GA - YB 

G(3) = RR*C0BE - D*C0BETH - E*SIPHDE - ZT*C0DE - XA 

G(4) = RR*SIBE - D*SIBETH + E*C0PHDE - ZT*SIDE - YA 

G(5) = RR*C0BE - D*C0BETH - ZF*C00MEP - U*SIEP - XA 

G(6) = RR*SIBE - D*SIBETH - ZF*SI0MEP + U*C0EP - YA 

And here is the Jacobian matrix G(q) = g q {q) • The non-zero entries of this 6x7 
array are 


GQ (1,1) 

= 

- RR*SIBE + D*SIBETH 

GQ(4,2) 

= 

- D*C0BETH 


GQ(1,2) 

= 

D*SIBETH 

GQ(4,4) 

= 

- E*SIPHDE 


GQ(1,3) 

= 

- SS*C0GA 

GQ(4,5) 

= 

- E*SIPHDE - 

- ZT*C0DE 

GQ(2,1) 

= 

RR*C0BE - D*C0BETH 

GQ(5,1) 

= 

- RR*SIBE + 

D*SIBETH 

GQ(2,2) 

= 

- D*C0BETH 

GQ(5,2) 

= 

D*SIBETH 


GQ(2,3) 

= 

- SS*SIGA 

GQ(5,6) 

= 

ZF*SI0MEP 


GQ(3,1) 

= 

- RR*SIBE + D*SIBETH 

GQ(5,7) 

= 

ZF*SI0MEP - 

U*C0EP 

GQ(3,2) 

= 

D*SIBETH 

GQ(6,1) 

= 

RR*C0BE - D*C0BETH 

GQ(3,4) 

= 

- E*C0PHDE 

GQ(6,2) 

= 

- D*C0BETH 


GQ(3,5) 

= 

- E*C0PHDE + ZT*SIDE 

GQ(6,6) 

= 

- ZF*C00MEP 


GQ(4,1) 

= 

RR*C0BE - D*C0BETH 

GQ(6,7) 

= 

- ZF*C00MEP 

- U*SIEP 


If we apply a numerical method to the index 1 formulation of the system, we also 
need the expression g qq (q)(q , q ). It is given by 


GQQ(l) 

GQQ(2) 

GQQ(3) 

GQQ(4) 

GQQ(5) 

GQQ(6) 


RR*C0BE*V(1)**2 + D*C0BETH*(V(l)+V(2))**2 + 
SS*SIGA*V(3)**2 

RR*SIBE*V(1)**2 + D*SIBETH*(V(l)+V(2))**2 - 
SS*C0GA*V(3)**2 

RR*C0BE*V(1)**2 + D*C0BETH*(V(l)+V(2))**2 + 
E*SIPHDE*(V(4)+V(5))**2 + ZT*C0DE*V(5)**2 
RR*SIBE*V(1)**2 + D*SIBETH*(V(l)+V(2))**2 - 
E*C0PHDE*(V(4)+V(5))**2 + ZT*SIDE*V(5)**2 
RR*C0BE*V(l)**2 + D*C0BETH*(V(l)+V(2))**2 + 
ZF*C00MEP*(V(6)+V(7))**2 + U*SIEP*V(7)**2 
RR*SIBE*V(l)**2 + D*SIBETH*(V(1)+V(2))**2 + 
ZF*SI0MEP*(V(6)+V(7))**2 - U*C0EP*V(7)**2 


Computation of Consistent Initial Values 


We first compute a solution of g(q ) = 0. Since g consists of 6 equations in 7 
unknowns we can fix one of them arbitrarily, say 0(0) =0, and compute the re¬ 
maining coordinates by Newton iterations. This gives 

/?(0)= -0.0617138900142764496358948458001 
7(0) = 0.455279819163070380255912382449 
$(0)= 0.222668390165885884674473185609 
<5(0) = 0.487364979543842550225598953530 
ft(0)= -0.222668390165885884674473185609 
e(0) = 1.23054744454982119249735015568. 


(7.7) 
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The condition G(q)q = 0 is satisfied if we put 


/?(0) = 0(0) = 7 ( 0 ) = $(0) = 8(0) = 0(0) = e(0) = 0. (7.8) 

The values of A(0) and q( 0) are then uniquely determined by (7.6a) and the twice 
differentiated constraint 0 = g qq (q)(q, q) + G(q)q . We just have to solve a linear 
system with the matrix 

(M(q) GT(q)\ 

\G(q) 0 )■ (Ay) 

Observe that g qq need not be evaluated, because 4(0) = 0. Due to the choice 
0(0) = 0 most components of A(0) and 4(0) vanish. Only the first two of these 
are different from zero and given by 

(3(0) = 14222.4439199541138705911625887 

0(0) = -10666.8329399655854029433719415 
v ; (7.10) 

A x ( 0 ) - 98.5668703962410896057654982170 

A 2 ( 0 ) = -6.12268834425566265503114393122. 

The solution of this seven body mechanism is plotted (mod 2tc) in Fig.7.2 for 
0 < t < 0.03 . 


Numerical Computations 

We first transform (7.6) into a first order system by introducing the new variable 
v — q . Our codes apply only to problems where the derivative is multiplied by a 
constant matrix. We therefore also consider w = q as a variable so that (7.6a) be¬ 
comes an algebraic relation. The various formulations of the problem, as discussed 
in Sect. VII. 1, are now as follows: 
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Index 3 Formulation. With v — q and w = q the system (7.6) can be written as 


q — v (7.11a) 

v = w (7.11b) 

o = M(q)w -f(q,v) + G T (q)X (7.11c) 

0 = g(q). (7.1 Id) 

Index 2 Formulation. If we differentiate 0 = g(q) once and replace (7.1 Id) by 

0 = G(q)v, (7.1 le) 

we get an index 2 problem which is mathematically equivalent to (7.6). 

Index 1 Formulation. One more differentiation of (7.1 le) yields 

Q= 9 qg {<l)(.v,v) + G(q)w, (7.1 If) 

so that (7.11a,b,c,f) constitutes an index 1 problem. 


We have applied several codes with many different tolerances between 10 -2 
and 10“ 10 to these formulations. The results are given in Fig. 7.3. We have plotted 
the computing time (on a SUN Spark 20 workstation) against the error of the (g, v) - 
components at x Qnd — 0.03 (in double logarithmic scale). 



Fig. 7.3. Work-precision diagram 


Explicit Runge-Kutta Methods. The index 1 formulation allows us to apply ex¬ 
plicit methods such as DOPRI5 or DOP853 of Volume I. For this we have written a 
function subroutine which solves in each call the linear system (7.1 lc,f) for w and 
A and inserts the result into (7.11a,b). Since there is no stiffness in the obtained 
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differential equation for (g, v), it is not surprising that here the explicit codes work 
very efficiently (Fig. 7.3). 

In order to avoid the drift-off phenomenon (see Sect. VII.2), we have also com¬ 
bined this method with projections onto the solution manifold. This can be im¬ 
plemented conveniently with help of the subroutine SOLOUT, which is called by 
DOPRI5 after every successful step (set IRTRN = 2 in order to indicate that the 
numerical approximation has been altered). The full projection (on position and 
velocity level, (7.lid) and (7.lie)) is slightly more expensive than velocity stabi¬ 
lization alone (denoted by DOPRI5_VEL in Fig. 7.3) and does not give improved 
results. The first picture of Fig. 7.4 shows the results of the three different imple¬ 
mentations: the ‘standard’ approach is without any projection, ‘velocity’ means 
that we perform only velocity stabilization, and ‘position’ indicated that we do 
consecutive projections on the position and velocity level. We see that velocity 
stabilization gives the best results concerning achieved accuracy and computing 
time. 


Half-Explicit Methods. These methods (discussed in Sect. VII.6) are especially 
adapted to the numerical solution of (nonstiff) constrained mechanical systems. 
Only linear systems with the matrix (7.9) have to be solved, otherwise the methods 
are explicit. Since they are applied directly to the index 2 formulation, the velocity 
constraint (7.1 le) is automatically satisfied, and no subroutine for the computation 
of 9 qq (<l)(v,v) is required. 

The extrapolation code MEXX of Lubich (1989) (see also Lubich, Nowak, 
Poehle & Engstler 1992) implements the half-explicit mid-point rule (6.18). The 
existence of an h 2 -expansion (Theorem 6.3) justifies extrapolation and thus yields 
methods of arbitrarily high order. It is not surprising that this code gives excellent 
results for high precision computations. 

The first code implementing half-explicit Runge-Kutta methods is HEM5 of 
Brasey (1994). It has been modified and improved by Arnold (1995, code HEX5) 
and Murua (1995, code PHEM56). We have also included the results of the lat¬ 
ter code (Fig. 7.3). It is slightly less efficient than DOPRI5_VEL in this particu¬ 
lar example, because the evaluation of g qq (q)(v,v) is cheap. Arnold (1995) and 
Murua (1995) report about experiments (with expensive g qq {q){v, v)), where the 
half-explicit methods are superiour to explicit Runge-Kutta methods with velocity 
projection. 


BDE The famous code DASSL of Petzold (1982), see also Brenan, Campbell & 
Petzold (1989), is a realization of the BDF multistep formulas. It is written for 
problems of the general form F(u y u r ,x) = 0,so that it is not necessary to introduce 
q of (7.6) as new variable. We applied it using default values for all parameters 
except for the scaling of the error estimation. We put INFO(2)=l and 


ATOL(I) = RTOL(I) = 


Tol 

1.0D0 


for 1 = 1 ,..., 14, 
for I >15, 


which means that we control the accuracy for q and v , but not for the Lagrange 
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Fig. 7.4. Work-precision diagram 


multipliers A. In the comparisons of Fig. 7.3 (index 2 formulatin) and Fig. 7.4 we 
used the full Jacobian of the problem, obtained by numerical differentiation. This 
turned out to be more efficient than providing an analytic approximation, where the 
derivatives of /, M and G are neglected. 

Implicit Runge-Kutta Methods. Our code RADAU5 is written for problems of the 
form By' = f{x, y) with constant, possibly singular matrix B . It can therefore 
be applied to all three of the above formulations. Convergence is guaranteed by 
Theorem VI. 1.1 for the index 1 formulation, by Theorems 4.5 and 4.6 for the index 
2 formulation, and by the results of HLR89 for the index 3 case. However, the 
higher the index, the more difficult is it to solve the nonlinear Runge-Kutta equa¬ 
tions. We have applied the code with the options IWORK(5) = 14, IWORK(6) = 0 
and IWORK(7) = 13 (IWORK(5) = 7 and IWORK(6) = 7 for the index 3 formula¬ 
tion), so that the acceleration w and the Lagrange multiplier A are scaled by h 2 
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in the error estimation. This guarantees the convergence of the simplified Newton 
iterations (see HLR89, Chapter 7 for a justification). Furthermore, we have ex¬ 
ploited the special structure q = v , v = w of our system by setting IWORK(9) =14 
and IWORK(IO) = 7. This speeds up the computation of the arising linear systems. 
The results are given in Fig. 7.3 (index 2 formulation) and in the lower left picture 
of Fig. 7.3 for all three formulations of the problem. We have used an analytical 
approximation to the Jacobian (neglecting the derivatives of f, M and G) and did 
not apply any projection onto the solution manifold. 


Savings in Linear Algebra. If the problem is nonstiff, one can use a reduced Ja¬ 
cobian for the solution of the nonlinear Runge-Kutta equations. Neglecting the 
derivatives of /, M and G (what we have done for the above calculations), we are 
led to linear systems of the form (in the index 2 case) 


/ —al 

I 

0 

° 

/ Aq \ 

f a \ 

0 

-al 

I 

0 

I Av 

- 1 b 

0 

0 

M 

G t 

1 Aw 


V 0 

G 

0 

0 / 

Vaa / 

W 


(7.12) 


where a = ( h 7 ) _1 , h the step size and 7 an eigenvalue of the Runge-Kutta matrix. 
The evaluation of the matrix in (7.12) is free, because M(q) and G(q) have to 
be evaluated anyway for the right-hand side of the differential-algebraic system. 
Eliminating the variable Av in the last row of (7.12) yields the smaller system 


M G T \ (Aw 
G 0 ) 1 AA 


c 

ocd -j- Gb 


(7.13) 


which is of the same type as those for the explicit methods. Once a solution 
to (7.13) is known the values of Av and A q are easily obtained from the first 
two rows of (7.12). We observe that the matrix in (7.13) does not depend on 
a = (/i 7 ) _1 . Hence only one LU decomposition is necessary for a step, inde¬ 
pendently of the number of distinct eigenvalues of the Runge-Kutta matrix. An 
implementation of these ideas reduced considerably the work for solving the non¬ 
linear systems (see last picture of Fig. 7.4). 

A similar reduction of the linear algebra was first proposed by Gear, Gupta & 
Leimkuhler (1985) for the BDF schemes. The above idea is not restricted to the 
index 2 case, and extends straightforwardly to the index 1 and index 3 situations. 
We finally remark that one has the possibility of retaining the decomposed matrix 
of (7.13) over several steps even in the case when the step size is changed. 



VII.7 Computation of Multibody Mechanisms 541 


A Stiff Mechanical System 


We now want to introduce some “stiffness” into the above mechanical system. To 
this end we take into account the elasticity of one of these bodies ( K 6 appears to be 
the simplest one) and replace it by a spring with very large spring constant c 1 . Thus 
the length of this spring will become an additional unknown variable q 8 . We let the 
unstretched length be zf (of Table 7.1), and assume that the centre of gravity C 6 
has constant distance fa from the upper joint (see Fig. 7.1). We further simplify the 
problem by assuming that the inertia of this body remains constant. Obviously the 
algebraic constraints (7.5) remain unchanged; we only have to replace the constant 
zf in (7.5) by the new variable q 8 . The derivative matrix G(q) = g'(q) has to be 
changed accordingly. It is now a 6 x 8 matrix. 

The equations of motion for this modified problem are obtained as follows: 
in the kinetic energy (7.3) only the contribution of the 6th body (the new spring) 
changes, namely 

C 6 : x 6 = xa + u • sine + (q 8 — fa) • cos(0 + e) 
y 6 =ya — U' cos e + (q 8 — fa) • sin(0 + e) 

= (<?8 -f a f • + 2 • ({q 8 -fa) 2 - u-(q 8 -fa) - sin Si) ■ SI -i 

+ ((q 8 ~f a ) 2 - 2 • u ■ (q s -fa) • sinO + u 2 ) • e 2 
+ 2 • u ■ cos f l-i ■ q 8 + ql 
oJq = O £ 


In the potential energy we have to add a term which is due to the new spring. We 
thus get (compare (7.4)) 


U = —mom • (3 + c 0 


2 


+ c i 


(%-zf) 2 

2 


(7.21) 


where the spring constant c 1 of the new spring is large. The resulting system is 
again of the form (7.6), but with q eR 8 . The initial values (7.7), (7.9), (7.12) for 
the 7 angles (7.2) are consistent for the new problem, if we require in addition 


<78 (0 )=zf q 8 (0) = 0. (7.22) 

This then implies g 8 (0) = 0. For the choice c 1 = 10 10 we applied the implicit 
codes RADAU5 and DASSL to the above stiff mechanical system. The behaviour of 
these methods was nearly identical to that for the original problem (Fig. 7.4). So 
there was no need to draw another picture. Obviously, the explicit codes DOPRI5, 
PHEM56 and MEXX do not work any longer. 

It should be remarked that for Tol<l/c 1 the efficiency of the implicit codes 
suddenly decreases. This is due to the fact that the exact solution of the prob¬ 
lem (with the initial values described above) is highly oscillatory with frequency 
0{^/cf) and amplitude 0(1/cf) about a smooth solution. A general theory for 
such situations has been elaborated by Ch. Lubich (1993). For very stringent toler¬ 
ances any code is forced to follow the oscillations and the step sizes become small. 
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Exercises 


1. Consider the differential equation (so-called “Kreiss problem”) 


v' = U T (x) ^ q 1 _° lj£ )U(x)y, 


U(x) 


cosx smx\ 
— sin x cos xj 


and apply the Runge-Kutta code RADAU5 to this stiff problem. You will ob¬ 
serve that, for a fixed tolerance, the number of function evaluations increases 
with decreasing e > 0. Then apply the method to the equivalent system 


y' = z, 0=(J °)u(x)z + U(x)y 


(7.24) 


and show that the number of function evaluations does not increase for e 0. 

a) Explain this phenomenon by studying the convergence of the simplified 
Newton iterations. 

b) Prove that the index of the system (7.24) with e = 0 is two. 



VII.8 Symplectic Methods for 
Constrained Hamiltonian Systems 


In principle, all approaches discussed in Sect. VII.2 can be employed for the numer¬ 
ical solution of constrained Hamiltonian systems. A disadvantage of these index 
reduction methods is, as we shall see below, that the symplectic structure of the 
flow is destroyed by the discretization. 

In Sect. 1.6 we have seen that the equations of motion for conservative me¬ 
chanical systems can be written either in terms of position and velocity coordi¬ 
nates (Lagrangian formulation) or in terms of position and momentum coordinates 
(Hamiltonian formulation). For constrained mechanical systems the situation is 
exactly the same. In the present section we consider the Hamiltonian formulation 


4 = H P {p,<i) 

(8.1a) 

p' = -H q (p,q)-G T (q)\ 

(8.1b) 

Q = g(q)- 

(8.1c) 


Here, H :R n xt n -)l is the Hamiltonian function, H p and H q denote partial 
derivatives, g : R n —x R m (with m < n) are the constraints, and G(q) — g q {q) . If 
T(q, q) = \q T M(q)q (with invertible M(q) ) is the kinetic energy of a mechanical 
system and U(q) its potential energy, we have p = M(q)q and 

H(p,q) = ^p T M(q)- 1 p + U(q), (8.2) 

(see Eq. (1.6.26)) in contrast to the Lagrange function, which is given by C(q, q) — 
T(q , q) — U(q) . If M(q) — I (the identity), we have p — q and both formulations, 
(1.46) and (8.1), are identical. If M(q) depends on q , the formulation (8.1) may 
be numerically more advantageous than (1.46) (see Exercise 1). 

Differentiating the constraint in (8.1) twice, we get 

0 = G(q)H p [p,q), (8.3a) 

° = { G ^ H P ^ q) ) H p ( p ’- G ^ H PP q ) ( H M «) + gT (<z) A ), (8- 3b ) 

and we see that A can be expressed in terms of p and q , if 

G{q)H pp (p,q)G T (q ) is invertible (8.4) 

in a neighbourhood of the considered solution. Therefore, (8.1) is a differential- 
algebraic system of index 3. If H(p,q) is given by (8.2), condition (8.4) is the 
same as (1.47). 
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Properties of the Exact Flow 


Every solution of the system (8.1) satisfies (8.1c) and (8.3a). It therefore lies on the 
manifold 

M = {{p,q) |s(g)=0, G(q)H p (p,q) = 0}. (8.5) 

Extracting A from (8.3b) (this is possible, if (8.4) is satisfied), and inserting the 
resulting expression into (8.1b), yields a differential equation on the manifold M . 
The situation here is completely analogous to that of (1.22) of Sect. VII. 1. 

Symplecticity. Our next aim is to extend the result of Theorem 1.14.12 to con¬ 
strained Hamiltonian systems. We consider the differential 2-form 

n 

to 2 = '^dp I Adq 1 ( 8 . 6 ) 

l=i 

( p 1 and q 1 denote the components of the vectors p and q , respectively). The flow 
of the system (8.1), mapping an initial value (p 0 ,q 0 ) G M onto (p(tf), q(t)) e M, 
is denoted by <p t . For a differentiable function g : M -A M we further denote by 
g*oj 2 the differential 2-form, defined by 

(9*u 2 ){( i , £ 2 ) = w2 Wip, «)£ 1 > g'(p, «)£ 2 ) • 

This is formally identical to Definition 1.14.11, but here we are only interested in 
the case where ^ and £ 2 lie in the tangent space 

T (p,q) M = {( M > u ) I G (q) v = 0 ’ ^{ G (q) H p(Pi9)) v + G (9) H p P {Pi<i) u = 0 } 

of the manifold (8.5). 


Theorem 8.1. The flow ip t : M. —> M. of the system (8.1) is a symplectic transfor¬ 
mation on M, i.e., 

(^ 2 )(£i,£ 2 )=^ 2 (£i,£ 2 ) 

for all t y for all (p, q ), and for all , £ 2 lying in the tangent space T^ pq ^M. 


Proof. For f G T ( v ,q)M the tangent vector = <p' t (p,q){ G T (p(l)q(t)) M is a 
solution of the variational equation 

m n tv - m. ~ 


K= 1 J= 1 ^ H 


K= 1 y 


n 

<v = E 


A d 2 H , . j A d 2 H . . . . 

E d P l dp J ^ q ^' 6p + E^' Sq * 
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where the SX K (for I< = 1,..., m) are obtained by differentiation of (8.3b). We 
now compute the time derivative of cj 2 (£i , ££) • The terms, not depending on A or 
S A, vanish by Theorem 1.14.12. We therefore get 


A dq 3 dq jdq hdq 

IS — 1 T T — 1 * * 


K=1 7,7=1 




(8.7) 


k=i 


i=i 


Due to the symmetry of the second partial drivatives, the first expression of the 
right-hand side of Eq. (8.7) vanishes. The second expression also vanishes, because 
£2 lies in the tangent space T^ p ^ q ^M. Hence, constant, what 

proves the statement of the theorem. □ 


Preservation of the Hamiltonian. Differentiation of H(p(t),q(t)) with respect 
to time yields 

-H^H q -HjG T X + HfH p , 

with all expressions evaluated at (p(t), q(t )). The first term cancels with the last 
one, and the remaining term vanishes, because G(q)H (p, q) = 0 on the solution 
manifold. Consequently, the Hamiltonian function H(p, q ) is constant along solu¬ 
tions of (8.1). 


First Order Symplectic Method 


We shall now discuss in some detail the feasibility, the convergence, and the sym- 
plecticity of a simple first order method. The presented ideas will be useful for a 
better understanding of the later discussion of higher order methods. 

Inspired by (11.16.54), we consider the following discretization of (8.1): 


Pl =P 0 - h { H q(Pl,%) + GT (%) X l ) ( 8 - 8a ) 

Qi = % + hH p (p 1 , q 0 ) (8.8b) 

0 = g(q 1 ). (8.8c) 

The numerical approximation (p 2 , q 1 ) satisfies the constraint (8.1c), but not (8.3a). 
Therefore, we append the projection 

P 1 =P 1 -hG T (q 1 )fi (8.8d) 

0 = G(q 1 )H p (p 1 ,q 1 ), (8.8e) 

so that method (8.8a-e) yields approximations that stay in the manifold M of 
Eq. (8.5). 
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Existence of the Numerical Solution. We consider a slightly more general system 
than (8.8). If the initial values are not consistent, we replace the relations (8.8c) 
and (8.8e) by 


9{<h) = 9 {%) + hG{q 0 )H p (p 0 ,q 0 ) (8.9a) 

G{q 1 )H p {p 1 ,q 1 )=G(q 0 )H p (p 0 ,q 0 ). (8.9b) 

We shall show that the nonlinear system (8.8a,b), (8.9a) has a locally unique solu¬ 
tion. Inspired by the proof of Theorem 3.1 we write 

g(gi)-g{g 0 ) = [ g q (g 0 + T (g i-9 0 )) ^-(9i-%)• 

JO 

Inserting g{q 1 ) from (8.9a) and q 1 from (8.8b) and dividing by h yields 


G{q 0 )H p {p 0 ,q 0 ) 



+ r (<2i -9o)) dT - H P (Pi,q 0 )- 


We next develop H p [p 1 , q 0 ) as 


( 8 . 10 ) 


H p {Pi, 9 0 ) = H ( Po,qo)~ h H pp (Po + a (Pi -Po)> 9o ) d<J ( H (ft, q 0 ) + G T {q 0 ).\). 

Jo 

Inserting this formula into (8.10), an integration by parts shows that (8.9a) is equiv¬ 
alent to 

o = [ 0-r)g tq (qo+T( qi -q 0 )) dr- (H (p 0 ,q 0 ),H (p^qo)) (8.11) 

Jo 

- / g q (go + T (gi~go)) dr / H pp ( Po + a{p 1 -p 0 ),q 0 )da(H (p 1 ,q 0 ) + G T {q 0 )X 1 ). 

Jo Jo 

This is a linear system for Aj and allows us to express X 1 smoothly in terms 
of p 1 , q x , and of the initial values p 0 , q 0 . We insert the resulting expression for 
X 1 into (8.8a). Hence, (8.8a,b) becomes a nonlinear system for p 1 ,q 1 , which, 
for sufficiently small h , has a unique solution close to p 0 , q 0 (Implicit Function 
Theorem). It is interesting to note that, for h —> 0, the value X 1 from (8.11) does 
not converge to A(0), given by (8.3b), but to the solution A 0 of 

0 = \g qq {H p , H p ) - GH pp (H q + G T A 0 ). 

Here, all functions are evaluated at the initial value (p 0 ,q 0 ). 

The existence of the solution (p 1 , p) to the system (8.8d), (8.9b) follows from 
the Newton-Kantorovich Theorem (Ortega & Rheinboldt 1970) with initial approx¬ 
imation p 1 := p x , and p = 0, or also from the Implicit Function Theorem. 

We have not only shown that the system (8.8) possesses a locally unique so¬ 
lution, but we have also seen that the replacement of (8.8c,e) by (8.9) extends the 
definition of the method to arbitrary initial values (close to M ). We thus have 
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found a one-step method 



in R 2n , which reduces to (8.8) on the manifold M . For smooth functions g and 
H also <i> is smooth, and the classical theory (convergence, asymptotic expansions, 
...) can be applied to this method. 

Convergence of Order 1. It is sufficient to show that the local error is of size 
0(h 2 ). The convergence then follows from Theorem II.3.6 applied to (8.12). From 
the above investigation on the existence of the numerical solution we know that 
Pj = p 0 + 0(h ), q x = q 0 + 0(h ), and X 1 = A 0 + 0(h ). Consequently, we have 
from (8.8a,b) that 

q 1 = q{t 0 + h) + O(h 2 ), p x = p(t 0 + h) - hG T (q 0 ) 5X + 0(h 2 ) (8.13) 

with SX = A 0 — X(t 0 ). The disturbing term hG T (q 0 )SX is eliminated by the pro¬ 
jection (8.8d,e). This can be seen as follows: from (8.13) and (8.8d) we know that 
Pi — P{t o + h) ~ G T (q 0 )u + 0(h 2 ), so that 

G(q(t 0 + h))H p (p(t 0 +h)~ G T (q 0 )v, q{t 0 + h)) = Q(h 2 ). 

By (8.4) and the Implicit Function Theorem this implies v = 0(h 2 ), and the local 
error for both components (p and q) is of size 0(h 2 ). 


Symplecticity. Differentiation of the relations (8.8a,b) shows that (we use upper 
indices for the components) 


dpi = d P I 0 -hJ2 d gig p J ( Pi > %) d Pi ~ h it dt f dq j ( ^' ’ %)dq ° 


J= 1 


TTl Tl TG ^ a jy’ 

h E E d^p {%)dq ° ~ h ^ icr^oMAf 


A'—X 


J= 1 


A'—X 


dq 1 


,i tI ,^ d 2 H _ , w , A , , j 

dq i = Afo + h X fyigpiPi >9o)dPx + X • 

Taking the exterior product of the first formula with dql , and of the second formula 
with dp [, several terms cancel out (as in the proof of Theorem 8.1) and we obtain 

Tl Tl Tl ^2 

X dpi Adql = Yl dpi hdql-h X gZTgZJ (?i> %) d Pi A dql 
1=1 1=1 I,J= l Uq P 

Tl Tl Tl ^2 

X! dpi A = X d P' ^dql + h X o v ig a J ( P' ’ q o) d Pl A • 

7=1 7=1 7, J—1 P q 
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Summing up both formulas yields 


n 


n 


dp * A dc p = Yl dp o A dq ° > 

1=1 1=1 


(8.14) 


what proves that the method (8.8a-c) is symplectic. In order to show that also the 
projection (8.8d,e) is symplectic, we compute 


dp{ = dpi - h ^2 ^2 

K= 1 J=1 


d 2 g K 

dq I dq J 


(<h)dqi 

K= 1 


dg K 

dq 1 


(<h W K , 


and we obtain as above (using g{q 1 ) = 0) that 


Yh dp i A dq I i='%2 dp i A dq i ■ (8.15) 

i=i i=i 

Equations (8.14) and (8.15) together show that the complete procedure (8.8a-e) is 
symplectic. 


SHAKE and RATTLE 

These algorithms have been designed for problems with separable Hamiltonian 

S{p,q)='^p T M~ 1 p + U(q) (8.16) 

(constant matrix M), and are very popular in molecular dynamics simulation. Ob¬ 
serve that for this Hamiltonian the problem (8.1) becomes the second order differ¬ 
ential equation Mq" = —U q (q) — G T (q)X with constraint (8.1c). 

SHAKE. This method, due to Ryckaert, Ciccotti & Berendsen (1977), is given by 

q n +1 - 2 q n + q n -i = ~ h2M_1 ( u q (q n ) + G T (q„)K) (8-i7a) 

o = q{q n +i)- (8.17b) 

In the absence of constraints it is identical to Stormer’s method (Sect. III. 10), which 
in molecular dynamics applications is often referred the Verlet method (Verlet 
1967). The p-components are approximated by p n = M(q n+1 — q n _^)j2h. For 
an implementation of this 2-step method a stabilized version is recommended (see 
the end of Sect. III. 10). 

RATTLE. Denoting p n+1/2 := Pn - (h/2)(U q (q n ) + G T {q n )\ n ) , the SHAKE al¬ 
gorithm can be rewritten in the form 

P„+i /2 = Pn - ^(U q (q n ) + G T (q n )X n ) (8.18a) 

q n + 1 = q n + hM~ 1 p„ + 1/2 (8.18b) 

0 = g(q n+1 ). (8.18c) 
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The definition of p n+1 as in the SHAKE method requires the knowledge of <? n+2 . 
In order to avoid this difficulty, Andersen (1983) suggests to define p n+1 by 


Pn+i =P n+1 / 2 - -^{U q {q n+ i) + G T {q n+ i)lin) 
0 = G(q n+1 )M- 1 Pn+1 , 


(8.18d) 

(8.18e) 


so that also the hidden constraint (8.3a) is satisfied. These two equations constitute 
a linear system for (p n+1 , (i n ). 

Extension to General Hamiltonian Functions. It was observed by Jay (1994) 
that the RATTLE algorithm can be extended to general Hamiltonian functions as 
follows: for consistent values (p n ,q n ) £ M define 


Pn+ 1/2 =Pn~ 2 ( H q(Pn+l/2> Qn) + GT (<ln)K) (8.19a) 

«n+ 1 = ^ i H p (Pn+ 1 /2>) + H p (Pn+ 1 /2> «n+ 1 )) ( 8 ' 1 9b ) 

0 = 9(q„+i)- (8.19c) 

Pn+1 = Pn+ 1/2 — 2 (^(Pn+l/2> Qn+l) + ^'('S'n+l )/ i n) (8.19d) 

0 = G(q , n+1 )ff p (p n+1 , Q n+1 ). (8.19e) 


This is the special case s = 2 of the Lobatto IIIA-IIIB pair to be discussed below. 

The equations (8.19a-c) constitute a nonlinear system for the unknowns 
Pn+1/2 ’ <?n+1 » an d • I n th e same way as f° r the method (8.8) we can refor¬ 
mulate Eq. (8.19c) in such a way that A n can be expressed smoothly in terms of 
P n , q n , p n+1 / 2 ’ 9n+i’ an( ^ Hence, the numerical solution exists, is locally 
unique, and depends smoothly on ft and on the initial values (p n ,q n ). The same 
is true for the system (8.19d,e). If the equations (8.19c,e) are replaced by (8.9), we 
get a smooth extension of the method (8.19), defined on a neighbourhood of M in 

TCP 2n 


Theorem 8.2. The numerical method (8.19) is symmetric , convergent of order 2, 
and symplectic. 

Proof a) We consider the more general situation, where (8.19c,e) is replaced by 
(8.9). Replacing then ft by —ft, and exchanging (p n ,q n ) with (p n+ 1 ,<Z n +i) and 
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A n with ix n , we obtain 

Pn-\- 1/2 Pn +1 2 ^^<7 (Pn+1/2 5 ^n+l) “f - ^ (Vn+l)/^™^ 

Qn ^n + 1 2 ^^p(Prc+l/2 ’ ^n+l) "T ^p (Pn+1 /2 ’ 

^(O = 0(fti+l) - ^ G (^+l)^(Pn+l^n+l) 

Pn Pn+1 /2 2 (P^+l/2 ’ Qn) "T ^ (^n)^n^ 

^(Vn )^p(Pn’ ^n) ^(Vn + 1 )-^p(Pn+1 ? ^n+l)’ 

These are exactly the same equations as those of (8.19a,b,d) and (8,9), proving that 
even the extension of the method to a neighbourhood of M is symmetric. 

b) We consider the method (8.19) as a mapping (p n ,q n ) ^ (p n +i> 2n+i) on 
the manifold M of Eq. (8.5). The same considerations as for (8.8) show that (8.19) 
is a method of order at least one. Since it is symmetric, its order has to be even 
(Sect. II.8). This proves that (8.19) is a convergent method of order 2. 

c) The fact that the method (8.19) defines a symplectic transformation on M 

can be proved as for (8.8) (see Leimkuhler & Skeel (1994) for the case of a separa¬ 
ble Hamiltonian (8.16)). We do not give details here, because the symplecticity of 
(8.19) also follows from Theorem 8.5 below. □ 

Remark 8.3. In a step by step application of method (8.19) the projection (8.19d,e) 
can be avoided at those points, where the value p n+1 is not needed for output. 
Indeed, from the second step on we can replace (8.19a) by 

Pn+ 1/2 =Pn- 1/2 ~ \ ( H q (Pn+l /2> 9„) + H q(Pn- 1/2 » ?») + & (0(A„ + Pn -1 )) 

without changing the numerical approximations q n and p n+1 / 2 • The same trick is 
possible for method (8.8). 


The Lobatto IIIA-IIIB Pair 


Partitioned Runge-Kutta methods are well suited for unconstrained Hamiltonian 
systems (see Sect. 11.16). We shall investigate here, how these methods can be 
extended to the constrained system (8.1). We consider 


p t =Po+hJ2 a i j k j ’ 


3 = 1 


Pi 


-■Po+ h '%2 b i k i’ 


i= 1 


dH. 


Qi=%+ h J2° : i/j' 

3 = 1 
s 

<h =% + h Yj b i i n 

i= 1 

dH , 


K = Qi) ~ G T (Q t ) A,, = ^(P t , Qi), 


(8.20a) 

(8.20b) 


(8.20c) 
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where b^a- and b^a- are the coefficients of two Runge-Kutta schemes (c.f., 
Eq. (11.16.26)). For the moment, the values A- (i = 1,..., s) are not yet specified. 
There are several possibilities to do this. One can either define them by — 
A(P-, Q •), where A(p, q ) is the function given by (8.3b), or one can define them 
implicitly by adding the conditions G(Q i )H p (P i , Q { ) = 0 or g(Q i ) = 0. 

We are interested in symplectic schemes. Therefore it is natural to consider 
methods satisfying the conditions of Theorem 11.16.10. 


Lemma 8.4. If the coefficients of (8.20) satisfy 


bi = b it i = l,...,3 (8.21) 

0, i,j = l,...,s, (8.22) 

then we have the following relation for the expressions in (8.20): 


III II r\ TS 

d pi A ~ d Po A d< io = hh i D (L! A dA ?- 


1=1 


1=1 


2=1 K= 1 1=1 


dq 1 


If the Hamiltonian is separable (i.e., H(p , q) = T(p) + U(q)), then the condition 
(8.22) alone implies the above relation. 


Proof. We compute the expression D = dp\ A dq( — dp T 0 A dq$ following 
the lines of the proof of Theorem II.16.6 (see also the proof of Theorem 11.16.10). 
All terms cancel with exception of those originating from the presence of 
G T (Q i ) A- in (8.20c). We thus obtain 


„2L _2L / JL f)2 n K JL f) n K x 

jy=-bE b >E( A ^E d^(Qi) d Qf Ad Qi+T,Jj(Qi) dA f Ad Ql)' 


i=l K=1 


1=1 


Due to the symmetry of the second derivative of g K the term involving dQf A dQ\ 
vanishes identically. This proves the statement of the lemma. □ 


We are interested in partitioned Runge-Kutta methods that satisfy: 

• the numerical solution stays on the manifold M of Eq. (8.5); 

• the numerical flow (p 0 , q 0 ) i-» (p x , q x ) is a symplectic transformation on M ; 

• the order of convergence is higher than 2. 

If the values A • are determined by the condition 

g{Qi)= 0 for i = l,...,s, (8.23) 

then we have Y2i &9 K {Qi)dQl = 0, and it follows from Lemma 8.4 that the 
method (8.20) is symplectic, if (8.21) and (8.22) are satisfied. Hence, the second 
item holds. Here we see the importance of the conditions (8.23). Solving the index 
reduced system (8.1a,b), (8.3b) by a symplectic method would in general not result 
in a symplectic numerical flow on M . 
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How can we achieve the first item, in particular the condition g(q 1 ) = 0 ? The 
idea is to require the method a - • to be stiffly accurate, i.e., 

%=£,• for j = l,...,s. (8.24) 

In this case we have q x — Q s , and g(q 1 ) = 0 is automatically satisfied by (8.23). 
The condition (8.24) together with (8.22) implies that (assuming nonzero b i ) 

= 0 for i = l,...,s, (8.25) 

and the nonlinear system (8.20a,c), (8.23) no longer depends on A s . This parame¬ 
ter, however, appears in the definition of p x in Eq. (8.20b) via k s . There it can be 
used to impose the constraint G(q 1 )H p (p 1 , q x ) = 0. 

Due to the condition (8.25) a new difficulty arises. If we consider (8.20b,c) 
as definition of the quantities p 1? q 1 , k i ,£ i , the remaining equations (8.20a) and 
(8.23) are a nonlinear system for P 1 ,..., P s , Q-^ ,..., ,..., A s _ 1 . Counting 

the number of equations of this system (2 sn + sm) and the number of unknowns 
(2sn + (s — l)m), one is readily convinced that this nonlinear system will usually 


not have a solution. The idea (Jay 1994,1996) is to require 

a lj = 0 for j = l,...,s, (8.26) 

so that Q 1 — q 0 , and the condition (8.23) is automatically verified for i = 1 (we 
always assume consistent initial values). By (8.22) this implies (for nonzero b i ) 

a ix —b x for z = l,...,s. (8.27) 

The Runge-Kutta matrices A and A are both singular. Let A 0 be the (s — 1) x 5 


submatrix of A obtained by deleting its first row, and let A 0 be the 5 x (s — 1) 
submatrix of A formed by the first 5 — 1 columns of A. In order to be able to 
prove the existence of a numerical solution of (8.20), (8.23), we require that the 
(5 — 1) x (5 — 1) matrix 

A 0 A 0 is invertible. (8.28) 

We now extend the method to arbitrary initial values as follows: we replace condi¬ 
tion (8.23) by 

9 (Qi) = 9 {%) + CihG(q 0 )H p (p 0 ,q 0 ) for i = 

(c • = Ysj ®ij ) an d the condition G(q 1 )H p (p x , q 1 ) — 0 by (8.9b). Similar to Equa¬ 
tion (8.10) we use 

g(Qi)-g(%) = h [ g q (g 0 + T (Qi -%)) dr ■'i2%H p {P j ,Q j ). (8.29) 

Jo j=1 

Then we develop 

R P (Pj>Qj) = HpMj) 

- h fH pv (p 0 + a{P 3 -p 0 ), Q 3 )da ■ £ a jr ( H q {P r , Q r ) + G T (Q r ) A r ), 

Jo r=l 
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and insert this relation into (8.29). As in Eq. (8.11) we get a linear system for 
Aj,..., A a-1 which, for h = 0, has the solution A° given by 

^2 3 

0 = C fs qq (H p , H P ) + (J2 Z H d j) GH p q H p 

3 = 1 

~E{'t^a Jr )GH pp (H q + G T A 0 r ). 

r -1 j=l 

Here all functions are evaluated at (p 0 ,q 0 ). Due to (8.28) and (8.4) this system can 
be solved for A°. The Implicit Function Theorem then guarantees the existence 
of a locally unique solution of the method (8.20), (8.23), and the existence of a 
smooth extension to a neighbourhood of M. 

The question is now: do there exist high order methods having all these prop¬ 
erties? 


Theorem 8.5. The s-stage Lohatto IIIA-IIIB pair (Lobatto III A in the role of 
b i ,a i j, and Lobatto IIIB in the role of b i ,a lJ ; see Sect. IV.5 for their definition) 
satisfies (8.21), (8.22), (8.24), (8.25), (8.26), (8.27), and (8.28). 


Proof. Properties (8.21), (8.24), (8.25), (8.26), and (8.27) follow immediately from 
the definition of the methods. The symplecticity condition (8.22) has first been 
proved by Sun Geng (1993). We let d i ■ = bfi- + b j aj i — bfij and compute for 
k — 1,..., s 


3 = 1 


k-1 


= bf 


+I * 1 


■ b- - 


0. 


Here we have exploited the fact that the Lobatto III A method satisfies (7(6) and 
the Lobatto IIIB method satisfies D(s ) (see Table IV.5.13). Since the abscissae 
c x ,..., c s of the Lobatto quadrature are distinct, the above Vandermonde type sys¬ 
tem has a unique solution d { - = 0. This proves (8.22). 

We next show that 

S ij a jk) c T 2 = / 7-n for i,q = 2,...,s. (8.30) 

k = l j= 1 ^ ' 

This means that A 0 A 0 V = W, where V and W are nonsingular Vandermonde 
type matrices. This obviously implies (8.28). For q = 2,..., 6 — 1 Eq. (8.30) fol¬ 
lows from the fact that the methods Lobatto III A and IIIB satisfy (7(6) and (7(6 — 
2), respectively. It remains to show that the coefficients 5 i := Y^j « ij a jk c t~ 2 ~ 
c?/s(s — 1) vanish for all i. By (8.26) and c 1 = 0 we have 8 1 = 0. Because of 
a s j = bj = b ■ and c 3 = 1, the condition 8 S = 0 is nothing else than an order con¬ 
dition (order s), which is satisfied (Sect.IV.5). Since the Lobatto IIIA and IIIB 
methods satisfy D(s — 2) and D(s), respectively, it holds Yib^— 0 f° r 
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m = 1,..., s — 2. This proves that also S 2 ,..., 5 S _ 1 vanish, so that all relations of 
(8.30) are established. □ 


It still remains to discuss the order of convergence of the Lobatto IIIA-IIIB pair. 
Since we have succeeded in embedding the method into a one-step method that is 
defined in a whole neighbourhood of M , the convergence theory of Sect. II.3 can 
be applied. We only have to investigate the local error of the method. Each of 
the methods has classical order 2s —2 (Sect. IV.5), and it follows from Exercise 4 
that, considered as partitioned Runge-Kutta method, the pair has also order 2s —2. 
It has been shown in Jay (1994) that the presence of constraints (8.1c) does not 
reduce the order. The proof of this superconvergence result is very technical and 
long. Therefore we do not reproduce it here. 


Composition Methods 

Another possibility for obtaining high order symplectic methods for the system 
(8.1) is by composition of low order methods. The idea goes back to Yoshida 
(1990), and has been extended to constrained systems by Reich (1996). 

Consider the second order symmetric method (8.19) and denote its extension 
to a neighbourhood of M by <& h . We shall study the following composition 

$ c 1 ft 0$ c 2 ft 0$ c 1 A- (8-31) 

The method (8.31) represents a one-step method, defined in a neighbourhood of 
M . For initial values on M , the numerical solution stays on M . Moreover, 
the composition (8.31) is symplectic and symmetric. Observe that the projections 
(8.19d,e) can be avoided in an implementation of this method (see Remark 8.3). 
Concerning its order we have the following result. 

Theorem 8.6. Let & h be the mapping {p 0 ,q 0 ) »->• (p x , q^, defined by (8.19). If 

2c 1 + c 2 = 1, 2 c \ + c 2 = 0, (8.32) 

the composition method (8.31) is of order 4. 

If Q h represents a one-step method that is symmetric, of order p — 2k, and 
defined in a neighbourhood of M, then the relations 

2c 1 +c 2 = 1, 2c? +1 +c£ +1 = 0, (8.33) 

imply that the composition (8.31) is of order p + 2 . 

Proof. We let y 0 = {p 0 ,q 0 ) T and y(t) = {p(t),q(t)) T . The local error of the 
method (8.19) satisfies 

y(t 0 + h) - * k (y 0 ) = d(y 0 )h 3 + 0(h 4 ). 
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Since the basic method is of the form & h {y 0 ) = Vo + fo\I>(y 0 , h) , we have that 

y(t 0 + (2cj + c 2 )h) - $ Cih o $ C2h o $ Clft (y 0 ) = (2c* + c\)d(y Q )h 3 + £>(/i 4 ). 

The conditions (8.32) then imply that the method (8.31) is at least of order 3. Since 
it is symmetric, it has to be of order 4. The proof is easily adapted to the higher 
order situation. □ 


A solution of (8.32) is given by 

1 s/2 

Cl ~2^7i , ° 2 — ~~2—-//2‘ 

which shows that the intermediate step in the composition (8.31) is a ‘back step’ 
(negative step size c 2 h). 

The result of Theorem 8.6 allows us to construct symplectic integrators for 
(8.1) of an arbitrary even order. However, the resulting method of order p — 2k 
requires 3 fc_1 applications of the basic method (8.19). 

In the case of unconstrained Hamiltonian systems it is known that better meth¬ 
ods can be obtained by compositions of the form 




0 $ c s _ 1 fc°$c 


o 




(8.34) 


(see Yoshida 1990, McLachlan 1995, Sanz-Serna & Calvo 1994). Reich (1996) 
studies the extension of these methods to constrained Hamiltonian systems and 
finds that additional order conditions are necessary. His investigation relies on a 
“backward error analysis” for integrators on manifolds. 


Backward Error Analysis (for ODEs) 

Although backward analysis is a perfectly straightforward con¬ 
cept there is strong evidence that a training in classical mathema¬ 
tics leaves one unprepared to adopt it. 

(J.H. Wilkinson, NAG Newsletter 2/85) 

In Sect. 11.16 we have briefly explained the idea of backward error analysis for 
the symplectic Euler method. Here we present an extension to general one-step 
methods for ordinary differential equations. Consider 

y'= /(y), y(o)=y„, (8-35) 

and let y Q y 1 be an arbitrary one-step method for (8.35). We assume that f(y) 
and the method are sufficiently often differentiable, so that the local error can be 
expanded into a Taylor series as 

2/1 - y(h) = d p+ i{y 0 )h p+1 + • ■ • + d N (y 0 )h N + 0(h N +'). 


(8.36) 
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Theorem 8.7. Consider a one-step method of order p, and assume the local error 
to be given by (8.36). Then there exist functions fj(y) (for j = p,..., N), such 
that 

yi -y(h) = 0(h N+1 ), (8.37) 

where y(t) is the solution of the perturbed differential equation 

y' = m + h p f p (y) + ...+h N ~ 1 f N _ 1 (y), m=Vo: (8-38) 


Remark. If the function f(y) + h p f p (y) + ... + h N ~ 1 f N _ 1 (y) satisfies a Lip- 
schitz condition, the proof of Theorem II.3.4 shows that y n — y(nh) = 0(h N ) 
on bounded intervals. This implies that the numerical approximation y n is much 
closer to the solution of (8.38) than to that of (8.35). Hence, the study of the system 
(8.38) yields new insight into the behaviour of the numerical solution. 

Proof. As a consequence of the nonlinear variation-of-constants formula (Theo¬ 
rem 1.14.5) we have 

y{h) = y{h) + ~(h,s,y(s)) • ( h p f p (y(s )) + ... + h N f N (y(s))^jds, 

where y(t,t 0 ,y 0 ) denotes the solution of (8.35) corresponding to initial values 
y(t 0 ) = y 0 . Expanding the above integral into a Taylor series we obtain 

y { h)-y(h)=h p+1 f p (y 0 )+h p ^(f p+1 + lflf+lff p )(y 0 ) + ... . (8.39) 

The condition (8.37) implies that the coefficients of (8.39) have to agree with those 
of (8.36) up to a certain order. We thus get f p (y) = d p+1 (y ), f p+1 (y) = d p+2 (y) - 
(fp{y)f(y) + f f {y)f p {y)) /2, etc. The essential observation is that the coefficient 
of h in (8.39) contains fj{y) as linear term and further expressions that only 
depend on f^y) with i<j. Hence, the functions fj{y) are recursively determined 
by the above comparison. □ 


Example 8.8. For an illustration of the above theorem we consider the Volterra- 
Lotka differential equation 

u f = u(v — 1), v f = v(2 — u). (8.40) 

This system possesses the first integral 

7(w,u) = 2lnu — u + lnu — u, (8.41) 

implying that the solutions are all periodic. Some of them are plotted in the left 
upper picture of Fig. 8.1. 

We apply three different numerical methods to this differential equation. The 
first one is the well-known explicit Euler method y n+1 = y n + hf(y n ). The right 
upper picture of Fig. 8.1 shows the numerical solution and the exact solution (solid 
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Fig. 8.1. Solutions of the perturbed differential equation for various methods 


line) for the initial value u 0 = 2.725, u 0 = 1. Moreover, we have included the 
solutions of the perturbed differential equation (8.38) for N = 1 (dashed-dotted 
line) and for N = 2 (dotted line). For the explicit Euler method, Eq. (8.38) reads 

h h 2 

y' = m - 2 iff) (y)+u (/"(/, /) + 4 / 77 ) ( y )• (8.42) 

We nicely observe the good agreement of the numerical solution with the exact 
solution of the perturbed system, even for the rather large step size h — 0.12. 

The left lower picture shows the same experiment for the implicit Euler method 
2/ n+1 = y n + hf(y n+1 ). The perturbed differential equation is obtained from (8.42) 
by replacing h by — h (this is, because the explicit Euler method is the adjoint 
method of the implicit Euler method). 

The third method is the symplectic Euler method (see Eq. (8.45) below), which 
for the problem (8.40) is defined by 

u n +\=u n + hu n (v n+1 -l), v n+1 = v n + hv n+1 {2-u n ). 

The first term of the perturbed differential equation is 

u' = u(v — 1) — huluv — 4v + v 2 + l)/2 

J J „ „ (8.43) 

v — v(2 — u) + hv(uv — 5u + u 2 + 4)/2. 

The qualitative behaviour of this method is quite different from that of the previous 
methods. One can prove that the system (8.43) has a first integral close to I(u, v) 
(Exercise 5). Hence the solutions are periodic, as it is the case for the original 
unperturbed system. 
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Example 8.9. For the Hamiltonian system (without constraints) 


q' = H p (p,q), 

p' = -H q (p,q) 

(8.44) 

the method (8.8) becomes 



q 1 =q 0 +hH p { Pl ,q 0 ), 

Pi =Po-hH q (p 1 ,q 0 ). 

(8.45) 


A similar method (implicit in q and explicit in p) has been considered in Sect. II. 16, 
Formula (11.16.54). There we have computed the first terms of the perturbed differ¬ 
ential equation (8.38), and we have noticed with surprise that it is also Hamiltonian. 
The same computation can be done here. We find that the perturbed differential 
equation for (8.45) is of the form 

q' = H p {p, q), p' = —H q (p, q) (8.46) 


with (for N — 2) 

~ h h 2 / \ 

H = H — -H p H q + — (H pp H* q + H qq Hl + 4 H„H,H q ). 

For notational convenience we have assumed that p and q are scalars. However, 
with a suitable interpretation of the appearing expressions, the formula is also valid 
for problems with more than one degree of freedom. 


Example 8.10. The second order method (8.19), when applied to the unconstrained 
system (8.44), becomes 


7l = % + \ ( H p (Pi /2 - <?0 ) + H p (Pi /2 - «1)) 
Pi = P0 - | ( H q (Pi /2 - % ) + H q (Pi /2 > <ll )) - 


(8.47) 


where p X j 2 — p 0 — (h/2)H q (p 1 / 2 i%)- Computing the dominant term of its local 
error, we see that the perturbed differential equation (8.38) is, for N = 2, given by 

h 2 / 

q' = H p (p,q) + - {-H PP X + 2H ppq H p H q + 2H pqq Hl 

+ 2H pq H pq H p + 4H pp H qq H p )(p,q) 
h 2 / 

p' = -H,(r, 5 ) + -- 2H„,H r H t - 2 

- + 2 (?, 5 ). 

One easily verifies that this is a Hamiltonian system (8.46) with 

H = H + - ( 2 H qq Hl - H pp H] + 2 H pq H p H q ). 
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A Short Survey on Further Results. A further elaboration of backward error 
analysis for ordinary differential equations would take us beyond the scope of this 
chapter. We therefore collect some interesting results without going into details. 

First of all, the mystery of the foregoing examples is well understood. In the sit¬ 
uation, where the differential equation (8.35) is a Hamiltonian system, and where 
a symplectic integration method is applied, the perturbed system (8.38) is again 
Hamiltonian for all N . This result is proved by Hairer (1994), where explicit for¬ 
mulas for the functions fj(y) in terms of elementary differentials are provided, and 
where an explicit formula for the perturbed Hamiltonian is given. This explicit rep¬ 
resentation guarantees that H (p, q) is uniquely defined on regions where H(p, q) 
is defined. Different proofs of this result can be found in Reich (1996) and Benettin 
& Giorgilli (1994). 

If the function / in (8.35) is infinitely differentiable, then the truncation index 
N in Theorem 8.7 is arbitrary. In general, the series (8.38) diverges as N — > oo 
and the constants hidden in the 0(h N+1 ) bounds of (8.37) tend to infinity with 
N , even if / is analytic. Therefore, it is interesting to find rigorous bounds on 
y 1 — y(h) for an optimally chosen N . Such results have been found independently 
by Benettin & Giorgilli (1994) and Hairer & Lubich (1996). As a consequence, one 
can show that for symplectic integrations the Hamiltonian remains bounded (with 
error of size 0(h p )) over exponentially long times. Moreover, KAM theory can be 
applied to get more insight into the long-time behaviour of symplectic numerical 
schemes. 


Backward Error Analysis on Manifolds 

Consider the constrained Hamiltonian system (8.1), and a numerical one-step 
method which yields approximations {p n ,q n ) staying on the manifold M of 
Eq. (8.5). Can we extend the above backward error analysis for ODEs to this situ¬ 
ation? 

There are at least two ways to achieve this goal. The first one is to introduce 
local coordinates in order to obtain an unconstrained Hamiltonian system. The 
backward analysis for ODEs can then be applied to the one-step method written in 
local coordinates. 

The second approach allows us to construct the perturbed Hamiltonian directly 
in the original coordinates. For the special case of separable Hamiltonians, this 
approach is due to Reich (1996). We shall explain it for the first and second order 
methods (8.8) and (8.19). 

Backward Error Analysis for the Method (8.8). Consider first the subsystem 
(8.8a-c). The projection step (8.8d,e) will be treated later. In Eq. (8.11) the value 
Aj has been expressed in terms of Pi, q 1 , p 0 , > even f° r inconsistent initial values. 

Inserting this function into (8.8a), the Eqs. (8.8a,b) represent two relations between 
the variable p x , q 1 , p 0 ,q 0 , and h . By the Implicit Function Theorem these two 
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relations allow us to express (pq,^) in terms of (p^o), h- Consequently, 
the solution A x of Eq. (8.11) can be written as a function of (p x , q 0 , h ). We denote 
it by 

•*1 = (8.48) 


so that the system (8.8a,b) becomes 

Pl=Po~ h { H q(Pli%) + GT {%) X (Pll%, h )) 
9i =%+hH p (p 1 ,q 0 ), 


(8.49) 


and the constraint (8.9a) is automatically satisfied by the definition of A(p x , q 0 , h ). 
We now consider the Hamiltonian function 


H{p,q) = H(p,q)+g(q) T X(p,q,h ), 


(8.50) 


where A(p, q, h) is the function defined in (8.48). The corresponding Hamiltonian 
system is 


?' = H p (p, q) + g{q) T X p (p, q , h) 

P' = (p, q) - G T (q)X(p, q, h) - g(q) T X (p, q, h). 


(8.51) 


The main observation is now that, for initial values satisfying g(q 0 ) = 0, the nu¬ 
merical solution (p!,^) of (8.49) is exactly the same as the numerical solution 
of the symplectic Euler method (8.45) applied to the (unconstrained) Hamiltonian 
system (8.51). Therefore, Example 8.9 shows that the numerical solution (p 1? 
is (9(/i 4 ) -close to the exact solution of (8.46), where in the definition of H the 
function H has to be replaced by H od Eq. (8.50). 

The projection step (8.8d,e) can be treated similarly. The solution fj, of (8.8d), 
(8.9b) depends on p x , q x , and h (the dependence on p 0 , q 0 can be omitted, because 
the relations (8.8a,b) allow us to express them in terms of p x , q 1 , and h). Due to 
the relation (8.8d) we can also consider /i asa function of p 1 ,q 1 ,h, i.e., p = 
/i(Pi, q 1 , h ). We now consider the Hamiltonian 


S(p,q)=g{q) T p{p, q,h), 


and the corresponding Hamiltonian system 
q' = g{q) T p p (p,q,h) 


p' = -G T {q)p{p , q, h) - g(q) T p q (p, q, h). 


(8.52) 


(8.53) 


If g(q 1 ) = 0, the numerical approximation p x , computed from (8.8d), i.e., p x — 
p x — hG T (q 1 )p{p 1 , q 1 , h ), is identical to the numerical solution of (8.45), applied 
to the system (8.53) with initial values (p!,^). Again, we obtain from Exam¬ 
ple 8.9 that the numerical solution (p 1 , q x ) is 0(h 4 ) -close to the exact solution of 
(8.46), where in the definition of H the function H has to be replaced by Q of 
Eq. (8.52). We summarize our findings in the following theorem. 
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Theorem 8.11. Consider the one-step method (8.8) and assume that the initial 
values are consistent, i.e., (p 0 , q 0 ) G M. Then it holds 

Pi ~p{h) = 0(h 4 ), qi -q{h) = 0{h% 

where p(t), q(t) is the solution of the Hamiltonian system (8.46) with 

H = H + G+^{H,G} + 1 ^ ({H, {H, G}} + {G, {G, #}}) 

where 

h = %- \n v u q + ^ ( n pp n 2 q + n qq n 2 p + *h P 9 h p h 9 ) 

^ h h 2 / \ 

G = g--gg+ — (g g 2 + g g 2 + 4g g g ), 

2 ^ P^ 7 q 12 PP^ 7 Q ^qq^p ' ^pq^p^qji 

and H and Q are given by (8.50) and (8.52), respectively. Here, the poisson 
bracket {H. G\ of two functions if, G : R n x R n —» R is given by {H.G\ := 
H p G q — H q G p (see Eq. (11.16.65)). 

Proof. We consider the one-step method (8.8) as a composition of the mappings 
(Po > %) ^ (Pi > 9i) ^ (Pi > 4i) {Pi , 4i) • Neglecting terms of size O(hP ), both 
mappings can be interpreted as the /i-flow of Hamiltonian systems. The statement 
thus follows from the Campbell-Baker-Hausdorff Formula (II. 16.83). □ 


Backward Error Analysis for the Method (8.19). We consider the solution A 0 of 
(8.19a,b), (8.9a) as a function of p^, <? 0 > anc * i-e., = KPi /21 %^) > an d the 

solution p 0 of (8.19d), (8.9b) as a function of p x , q x and h , i.e., p 0 = g(p 1 , ^, ft). 
The method (8.19) can therefore be written as the composition of 

Pl/2 = PO - ^ ( H q iPl/2 >%)+ GT (%) X (Pl/2’%’ h i) 

<h = % + 7;( H p(Pi/2’%) + H p(Pi/2,<h)) ( 8 - 54 ) 

Pi=Pi/2 ~7;( H q (Pi/2’<h) + GT (<h) x (Pi/2,<h, h )) 
with the projection step 

Pi =Pi - 7 }G T (q 1 )u(p 1 ,q 1 ,h), (8.55) 

where u(p 1 ,q 1 ^h)= p(p x , q x , h) - A (p 1 / 2 , q x , h). We see that, for consistent ini¬ 
tial values (p 0 .q 0 ) G M , (8.54) is identical to (8.47) with H (p, <?) replaced by 

W(p, (?) = H (p, <?) + g(q) T A(p, <?, h), (8.56) 

and the projection step (8.55) can be interpreted as method (8.45) with Hamiltonian 
function 

£?(*>,«) = ^g{<i) T ^(p,<i,h). 


(8.57) 
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In the same way as for the first order method we get: 

Theorem 8.12. Consider the method (8.19) and assume consistent initial values 
(Po •>% ) € A4. Then it holds 

Pi ~p{h) = 0(h 4 ), qi -q(h) = 0{h% 

where p(t ), q(t) is the solution of the Hamiltonian system (8.46) with 

H = H + G+~{H,G} + 1 ^ ({£, {H, G}} + {G, {G, H}}) 

where 

^ h ^ / \ 

H = n+ M { 2 n ri n l - n pp n l + 2 ' H p q Kp'H q ) 

G = g~ ^GpGg + (GppGg + GggG 2 p + 4 GpgGpG^j , 

and TL and Q are given by (8.56) and (8.57), respectively. □ 


The above two theorems show that, for consistent initial values, the numerical 
solution of the considered methods is (up to a certain order) the exact solution 
of an unconstrained perturbed Hamiltonian system. The perturbed Hamiltonian is 
defined in a neighbourhood of the manifold, so that all backward error analysis 
results for ODEs can be applied. 


Exercises 

1. (Jay 1995). The system (1.46) is equivalent to 

q' — u 

(M(q)u)' = M q {q){u,u) + f{q,u) - G T (q)X (8.58) 

0 = g{q)- 

In the case where (1.46) is obtained from the Lagrangian function £(q, q) = 
\q T M(q)q — U(q), show that f(q,u) always contains the term —M (q)(u,u) 
(Coriolis forces), which thus cancels out in the formulation (8.58). 

2. Show that the example (2.1a-c) is of the form (8. la-c) with Hamiltonian 

H(p,q) = (pl+pl)/2 + q 2 . 

If we compute A from (2.3), and insert it into (2. la,b), the resulting differential 
equation is no longer Hamiltonian. 
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3. Give a second proof of Theorem 8.1 by applying Theorem 1.14.12. 

Hint (Reich 1996). Let A = A(p, q) be defined by (8.3b) and consider the 
unconstrained Hamiltonian system with Hamiltonian 

H(p,q)+g{q) T X{p, q), 

whose flow reduces to that of (8.1) along the constraint manifold M . 

4. Consider a partitioned Runge-Kutta method applied to a partitioned ordinary 
differential equation (without constraints). Suppose that both methods are 
based on the same quadrature formula of order p, that the first method sat¬ 
isfies C(rj),D(£), and that the second method satisfies C(ff),D(£). Prove 
that the pair has order 

min(p, 2 min(? 7 , rj) + 2, min (rj, rj) + min(£, f) + 2, min (rj + f, rj + £ ) + l). 

Conclude that the Lobatto IIIA-IIIB pair has order 2s — 2 . 

Hint. Apply the ideas of the proof of Theorem II.7.4 for the verification of the 
order conditions (Sect. 11.15). 

5. Compute a first integral of the differential equation (8.43). What is the reason 
for the existence of such an invariant? 

Hint. With the transformation u = eP , v = e? you will get a Hamiltonian 
system. 

Result. /(u, v) = /(u, v) + /z((u + v) 2 — lOu — 8v + 8 In u + 2 In v)/4. 
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During the preparation of this book several programs have been developed for solv¬ 
ing stiff and differential-algebraic problems of the form 

My' = f(x,y), y(x 0 )=y 0 , (A.l) 

where M is a constant square matrix. If M is singular, the problem is differential- 
algebraic. In this case the initial values have to be consistent. 

The implicit Runge-Kutta code RADAU5 and its extension RADAUP can be 
applied to higher index (> 2) problems as well, whereas the Rosenbrock code 
RODAS and the exptrapolation code SEULEX are suited for explicit stiff differential 
equations and index 1 problems. The codes SDIRK4, ROS4, and SODEX are still 
available, but have not been updated. 

In the case where M is not a constant matrix, suitable transformations and/or 
introduction of new variables allow us to bring every implicit differential equation 
to the form (A.l). If the problem is originally in one of the following forms 

B(y)y'= f(x,y), y" = f(x,y,y'), B(y)y" = f(x,y,y'), 

or the like, then the efficiency of the code can be increased by setting some para¬ 
meters. This will be explained later in this appendix. 

Communication with the code during integration can be done with help of the 
user-supplied subroutine SOLOUT. This is illustrated in the driver below. Further 
applications of this subroutine are discussed at the end of this appendix. 

Experiences with all of our codes are welcome. The programs can be obtained 
by anonymous ftp (from “ftp.unige.ch” in the directory “pub/doc/math” or from 
“http://www.unige.ch/math/folks/hairer/”). 

Address: Section de Mathematiques, Case postale 240, CH-1211 Geneve 24, 
Switzerland 

E-mail: Emst.Hairer@math.unige.ch Gerhard.Wanner@math.unige.ch 
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Driver for the Code RADAU5 


“The van der Pol equation problem is so much harder than 
the rest ...” (L.F. Shampine 1987) 

We consider the van der Pol equation 

y'i=y 2 2 / 1 ( 0 ) = 2 

2/2 = {{ l -yl)v 2 — 2 / 1 ) /e y 2 (0) = -o.66 

with e = 10 -6 on the interval [0,2]. The subroutines FVPOL, JVPOL compute 
the right-hand side of this differential equation and its Jacobian. The subroutine 
SOLOUT is used to print the solution at equidistant points. 

c- 

C link driver radau5 decsol dc-decsol or 

C link driver radau5 lapack lapackc dc-lapack 

C- 

IMPLICIT REAL*8 (A-H,0-Z) 

C — PARAMETERS FOR RADAU5 (FULL JACOBIAN) 

PARAMETER (ND=2,LW0RK=4*ND*ND+12*ND+20,LIW0RK=3*ND+20) 

DIMENSION Y(ND),W0RK(LW0RK),IWORK(LIWORK) 

EXTERNAL FVPOL,JVPOL,SOLOUT 

C — PARAMETER IN THE DIFFERENTIAL EQUATION 
RPAR=1.0D-6 

C - DIMENSION OF THE SYSTEM 

N=2 

C — COMPUTE THE JACOBIAN ANALYTICALLY 
IJAC=1 

C — JACOBIAN IS A FULL MATRIX 
MLJAC=N 

C — DIFFERENTIAL EQUATION IS IN EXPLICIT FORM 
IMAS=0 

C — OUTPUT ROUTINE IS USED DURING INTEGRATION 
I0UT=1 

C — INITIAL VALUES 
X=0.0D0 
Y(1)=2.0D0 
Y(2)=-0.66D0 

C - ENDPOINT OF INTEGRATION 

XEND=2.0D0 

C — REQUIRED TOLERANCE 
RT0L=1.OD-4 
AT0L=1.0D0*RT0L 
IT0L=0 

C - INITIAL STEP SIZE 

H=1.OD-6 

C --- SET DEFAULT VALUES 
DO 1=1,20 

IWORK(I)=0 
WORK(l)=0.DO 
END DO 

C — CALL OF THE SUBROUTINE RADAU5 

CALL RADAU5(N,FVPOL,X,Y,XEND,H, 

+ RTOL,AT0L,ITOL, 

+ JVPOL,IJAC,MLJAC,MUJAC, 

+ FVPOL,IMAS,MLMAS,MUMAS, 

+ SOLOUT,IOUT, 

+ WORK,LWORK,IWORK,LIWORK,RPAR,IPAR,IDID) 
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C — PRINT FINAL SOLUTION 

WRITE (6,99) X,Y(1),Y(2) 

99 FORMAT(IX,’X =’,F5.2,’ Y =’,2E18.10) 

C — PRINT STATISTICS 

WRITE (6,90) RTOL 

90 FORMAT(’ rtol=’,D8.2) 

WRITE (6,91) (IWORK(J),J=14,20) 

91 FORMAT(’ fcn=’,I5,’ jac= , ,I4, , step=’,I4,’ accpt=’,I4, 

+ * rejct= , ,I3, , dec=’,I4,’ 801=’,15) 

STOP 

END 

C 

SUBROUTINE SOLOUT (NR,XOLD,X,Y,CONT,LRC,N,RPAR,IPAR,IRTRN) 

C — PRINTS SOLUTION AT EQUIDISTANT OUTPUT-POINTS BY USING "C0NTR5" 
IMPLICIT REAL*8 (A-H,0-Z) 

DIMENSION Y(N),CONT(LRC) 

COMMON /INTERN/XOUT 
IF (NR.EQ.l) THEN 

WRITE (6,99) X,Y(l),Y(2),NR-1 
X0UT=0.2D0 
ELSE 

10 CONTINUE 

IF (X.GE.XOUT) THEN 
C — CONTINUOUS OUTPUT FOR RADAU5 

WRITE (6,99) XOUT,C0NTR5(1,XOUT,CONT,LRC), 

+ C0NTR5(2,XOUT,CONT,LRC),NR-1 

X0UT=X0UT+0.2D0 
GOTO 10 
END IF 
END IF 

99 FORMAT (IX,’X =’,F5.2,’ Y = ’ ,2E18.10 , * NSTEP=\I4) 

RETURN 
END 
C 

SUBROUTINE FVPOL(N,X,Y,F,RPAR,IPAR) 

C — RIGHT-HAND SIDE OF VAN DER POL’S EQUATION 
IMPLICIT REAL*8 (A-H,0-Z) 

DIMENSION Y(N),F(N) 

F(1)=Y(2) 

F(2)=((1-Y(1)**2)*Y(2)-Y(1))/RPAR 

RETURN 

END 

C 

SUBROUTINE JVPOL(N,X,Y,DFY,LDFY,RPAR,IPAR) 

C — JACOBIAN OF VAN DER POL’S EQUATION 
IMPLICIT REAL*8 (A-H,0-Z) 

DIMENSION Y(N),DFY(LDFY,N) 

DFY(1,1)=0.ODO 
DFY(1,2)=1.0D0 

DFY(2,l)=(-2.0D0*Y(1)*Y(2)-1.ODO)/RPAR 
DFY(2,2)=(1.ODO-Y(1)**2)/RPAR 
RETURN 
END 


The result, obtained on a Sun SPARKstation 20, is the following: 


X 

= 

0.00 

Y 

= 

0.2000000000E+01 -0.6600000000E+00 

NSTEP 

= 

0 

X 

= 

0.20 

Y 

= 

0.1858210825E+01 -0.7575052373E+00 

NSTEP 

= 

10 

X 

= 

0.40 

Y 

= 

0.1693217727E+01 -0.9068995621E+00 

NSTEP 

= 

11 

X 

= 

0.60 

Y 

= 

0.1484573110E+01 -0.1233017457E+01 

NSTEP 

= 

13 

X 

= 

0.80 

Y 

= 

0.1083921362E+01 -0.6195010714E+01 

NSTEP 

= 

21 
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x = 1.00 
x = 1.20 

X = 1.40 
X = 1.60 
X = 1.80 
X = 2.00 
X = 2.00 


Y = 

Y = 

Y = 

Y = 

Y = 

Y = 

Y = 


-0.1863641256E+01 
-0.1699715970E+01 
-0.1493380698E+01 
-0.1120822309E+01 
0.1869064482E+01 
0.1706171005E+01 
0.1706171005E+01 


0.7535196392E+00 
0.8997232240E+00 
0.1213958018E+01 
0.4373266499E+01 
-0.7496053261E+00 
-0.8928020961E+00 
-0.8928020961E+00 


rtol=0.10D-03 


NSTEP = 144 
NSTEP = 145 
NSTEP = 147 
NSTEP = 153 
NSTEP = 275 
NSTEP = 276 


fcn= 2263 jac= 182 step= 293 accpt= 276 rejct= 9 dec= 251 sol= 662 


Subroutine RADAU5 


Implicit Runge-Kutta code based on the 3-stage Radau IIA method, given in Table 
IV.5.6. Details on the implementation are described in Section IV.8. 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


SUBROUTINE RADAU5(N,FCN,X,Y,XEND,H, 

+ RT0L,AT0L,ITOL, 

+ JAC ,IJAC,MLJAC,MUJAC, 

+ MAS ,IMAS,MLMAS,MUMAS, 

+ SOLOUT,IOUT, 

+ WORK,LWORK,IWORK,LIWORK,RPAR,IPAR,IDID) 


NUMERICAL SOLUTION OF A STIFF (OR DIFFERENTIAL ALGEBRAIC) 

SYSTEM OF FIRST ORDER ORDINARY DIFFERENTIAL EQUATIONS 
M*Y’=F(X,Y). 

THE SYSTEM CAN BE (LINEARLY) IMPLICIT (MASS-MATRIX M .NE. I) 

OR EXPLICIT (M=I). 

THE METHOD USED IS AN IMPLICIT RUNGE-KUTTA METHOD (RADAU IIA) 

OF ORDER 5 WITH STEP SIZE CONTROL AND CONTINUOUS OUTPUT. 

C.F. SECTION IV.8 

AUTHORS: E. HAIRER AND G. WANNER 

UNIVERSITE DE GENEVE, DEPT. DE MATHEMATIQUES 
CH-1211 GENEVE 24, SWITZERLAND 

E-MAIL: HAIRERQDIVSUN.UNIGE.CH, WANNERQDIVSUN.UNIGE.CH 

THIS CODE IS PART OF THE BOOK: 

E. HAIRER AND G. WANNER, SOLVING ORDINARY DIFFERENTIAL 
EQUATIONS II. STIFF AND DIFFERENTIAL-ALGEBRAIC PROBLEMS. 
SPRINGER SERIES IN COMPUTATIONAL MATHEMATICS 14, 
SPRINGER-VERLAG 1991, SECOND EDITION 1996. 

VERSION OF SEPTEMBER 30, 1995 

INPUT PARAMETERS 


N DIMENSION OF THE SYSTEM 

FCN NAME (EXTERNAL) OF SUBROUTINE COMPUTING THE 

VALUE OF F(X,Y): 

SUBROUTINE FCN(N,X,Y,F,RPAR,IPAR) 

REAL*8 X,Y(N),F(N) 

F(l)=... ETC. 

RPAR, IPAR (SEE BELOW) 

X INITIAL X-VALUE 

Y(N) 


INITIAL VALUES FOR Y 
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c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


XEND FINAL X-VALUE (XEND-X MAY BE POSITIVE OR NEGATIVE) 

H INITIAL STEP SIZE GUESS; 

FOR STIFF EQUATIONS WITH INITIAL TRANSIENT, 

H=1.DO/(NORM OF F>), USUALLY l.D-3 OR l.D-5, IS GOOD. 
THIS CHOICE IS NOT VERY IMPORTANT, THE STEP SIZE IS 
QUICKLY ADAPTED. (IF H=O.DO, THE CODE PUTS H=l.D-6). 

RTOL,ATOL RELATIVE AND ABSOLUTE ERROR TOLERANCES. THEY 

CAN BE BOTH SCALARS OR ELSE BOTH VECTORS OF LENGTH N. 

ITOL SWITCH FOR RTOL AND ATOL: 

IT0L=0: BOTH RTOL AND ATOL ARE SCALARS. 

THE CODE KEEPS, ROUGHLY, THE LOCAL ERROR OF 
Y(I) BELOW RTOL*ABS(Y(I))+ATOL 
ITOL=l: BOTH RTOL AND ATOL ARE VECTORS. 

THE CODE KEEPS THE LOCAL ERROR OF Y(I) BELOW 
RTOL(I)*ABS(Y(I))+ATOL(I). 

JAC NAME (EXTERNAL) OF THE SUBROUTINE WHICH COMPUTES 

THE PARTIAL DERIVATIVES OF F(X,Y) WITH RESPECT TO Y 
(THIS ROUTINE IS ONLY CALLED IF IJAC=1; SUPPLY 
A DUMMY SUBROUTINE IN THE CASE IJAC=0). 

FOR IJAC=1, THIS SUBROUTINE MUST HAVE THE FORM 
SUBROUTINE JAC(N,X,Y,DFY,LDFY,RPAR,IPAR) 

REAL*8 X,Y(N),DFY(LDFY,N) 

DFY(1,i)= ... 

LDFY, THE COLUMN-LENGTH OF THE ARRAY, IS 
FURNISHED BY THE CALLING PROGRAM. 

IF (MLJAC.EQ.N) THE JACOBIAN IS SUPPOSED TO 
BE FULL AND THE PARTIAL DERIVATIVES ARE 
STORED IN DFY AS 

DFY(I,J) = PARTIAL F(I) / PARTIAL Y(J) 

ELSE, THE JACOBIAN IS TAKEN AS BANDED AND 
THE PARTIAL DERIVATIVES ARE STORED 
DIAGONAL-WISE AS 

DFY(I-J+MUJAC+1,J) = PARTIAL F(I) / PARTIAL Y(J). 

IJAC SWITCH FOR THE COMPUTATION OF THE JACOBIAN: 

IJAC=0: JACOBIAN IS COMPUTED INTERNALLY BY FINITE 
DIFFERENCES, SUBROUTINE "JAC" IS NEVER CALLED. 
IJAC=1: JACOBIAN IS SUPPLIED BY SUBROUTINE JAC. 

MLJAC SWITCH FOR THE BANDED STRUCTURE OF THE JACOBIAN: 

MLJAC=N: JACOBIAN IS A FULL MATRIX. THE LINEAR 

ALGEBRA IS DONE BY FULL-MATRIX GAUSS-ELIMINATION. 
0<=MLJAC<N: MLJAC IS THE LOWER BANDWITH OF JACOBIAN 
MATRIX (>= NUMBER OF NON-ZERO DIAGONALS BELOW 
THE MAIN DIAGONAL). 

MUJAC UPPER BANDWITH OF JACOBIAN MATRIX (>= NUMBER OF NON¬ 

ZERO DIAGONALS ABOVE THE MAIN DIAGONAL). 

NEED NOT BE DEFINED IF MLJAC=N. 

- MAS,IMAS,MLMAS, AND MUMAS HAVE ANALOG MEANINGS - 

- FOR THE "MASS MATRIX" (THE MATRIX "M" OF SECTION IV.8): - 

MAS NAME (EXTERNAL) OF SUBROUTINE COMPUTING THE MASS- 

MATRIX M. 

IF IMAS=0, THIS MATRIX IS ASSUMED TO BE THE IDENTITY 
MATRIX AND NEEDS NOT TO BE DEFINED; 
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C SUPPLY A DUMMY SUBROUTINE IN THIS CASE. 

C IF IMAS=i, THE SUBROUTINE MAS IS OF THE FORM 

C SUBROUTINE MAS(N,AM,LMAS,RPAR,IPAR) 

C REAL*8 AM(LMAS,N) 

C AM(l,i)= _ 

C IF (MLMAS.EQ.N) THE MASS-MATRIX IS STORED 

C AS FULL MATRIX LIKE 

C AM(I,J) = M(I,J) 

C ELSE, THE MATRIX IS TAKEN AS BANDED AND STORED 

C DIAGONAL-WISE AS 

C AM(I-J+MUMAS+i,J) = M(I,J). 

C 

C IMAS GIVES INFORMATION ON THE MASS-MATRIX: 

C IMAS=0: M IS SUPPOSED TO BE THE IDENTITY 

C MATRIX, MAS IS NEVER CALLED. 

C IMAS=i: MASS-MATRIX IS SUPPLIED. 

C 

C MLMAS SWITCH FOR THE BANDED STRUCTURE OF THE MASS-MATRIX: 

C MLMAS=N: THE FULL MATRIX CASE. THE LINEAR 

C ALGEBRA IS DONE BY FULL-MATRIX GAUSS-ELIMINATION. 

C 0<=MLMAS<N: MLMAS IS THE LOWER BANDWITH OF THE 

C MATRIX (>= NUMBER OF NON-ZERO DIAGONALS BELOW 

C THE MAIN DIAGONAL). 

C MLMAS IS SUPPOSED TO BE .LE. MLJAC. 

C 

C MUMAS UPPER BANDWITH OF MASS-MATRIX (>= NUMBER OF NON- 

C ZERO DIAGONALS ABOVE THE MAIN DIAGONAL). 

C NEED NOT BE DEFINED IF MLMAS=N. 

C MUMAS IS SUPPOSED TO BE .LE. MUJAC. 

C 

C SOLOUT NAME (EXTERNAL) OF SUBROUTINE PROVIDING THE 

C NUMERICAL SOLUTION DURING INTEGRATION. 

C IF I0UT=1, IT IS CALLED AFTER EVERY SUCCESSFUL STEP. 

C SUPPLY A DUMMY SUBROUTINE IF I0UT=0. 

C IT MUST HAVE THE FORM 

C SUBROUTINE SOLOUT (NR,XOLD,X,Y,CONT,LRC,N, 

C RPAR,IPAR,IRTRN) 

C REAL*8 X,Y(N),CONT(LRC) 

C .... 

C SOLOUT FURNISHES THE SOLUTION "Y" AT THE NR-TH 

C GRID-POINT "X" (THEREBY THE INITIAL VALUE IS 

C THE FIRST GRID-POINT). 

C "XOLD" IS THE PRECEEDING GRID-POINT. 

C "IRTRN" SERVES TO INTERRUPT THE INTEGRATION. IF IRTRN 

C IS SET <0, RADAU5 RETURNS TO THE CALLING PROGRAM. 

C 

C - CONTINUOUS OUTPUT: - 

C DURING CALLS TO "SOLOUT", A CONTINUOUS SOLUTION 

C FOR THE INTERVAL [XOLD,X] IS AVAILABLE THROUGH 

C THE FUNCTION 

C »> C0NTR5 (I, S , CONT, LRC) «< 

C WHICH PROVIDES AN APPROXIMATION TO THE I-TH 

C COMPONENT OF THE SOLUTION AT THE POINT S. THE VALUE 

C S SHOULD LIE IN THE INTERVAL [XOLD,X]. 

C DO NOT CHANGE THE ENTRIES OF CONT(LRC), IF THE 

C DENSE OUTPUT FUNCTION IS USED. 

C 

C IOUT SWITCH FOR CALLING THE SUBROUTINE SOLOUT: 

C I0UT=0: SUBROUTINE IS NEVER CALLED 

C IOUT=i: SUBROUTINE IS AVAILABLE FOR OUTPUT. 

C 

C WORK ARRAY OF WORKING SPACE OF LENGTH "LWORK". 
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c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


W0RK(1), WORK(2),.., W0RK(20) SERVE AS PARAMETERS 
FOR THE CODE. FOR STANDARD USE OF THE CODE 
WORK(l),..,W0RK(20) MUST BE SET TO ZERO BEFORE 
CALLING. SEE BELOW FOR A MORE SOPHISTICATED USE. 

WORK(21),..,WORK(LWORK) SERVE AS WORKING SPACE 
FOR ALL VECTORS AND MATRICES. 

"LWORK" MUST BE AT LEAST 

N*(LJAC+LMAS+3*LE+12)+20 

WHERE 

LJAC=N IF MLJAC=N (FULL JACOBIAN) 

LJAC=MLJAC+MUJAC+1 IF MLJAC<N (BANDED JAC.) 

AND 

LMAS=0 IF IMAS=0 

LMAS=N IF IMAS=1 AND MLMAS=N (FULL) 

LMAS=MLMAS+MUMAS+1 IF MLMAS<N (BANDED MASS-M.) 

AND 

LE=N IF MLJAC=N (FULL JACOBIAN) 

LE=2 * MLJA C+MUJA C+1 IF MLJAC<N (BANDED JAC.) 

IN THE USUAL CASE WHERE THE JACOBIAN IS FULL AND THE 
MASS-MATRIX IS THE INDENTITY (IMAS=0), THE MINIMUM 
STORAGE REQUIREMENT IS 

LWORK = 4*N*N+12*N+20. 

IF IW0RK(9)=M1>0 THEN "LWORK" MUST BE AT LEAST 
N*(LJAC+12)+(N-Ml)*(LMAS+3*LE)+20 
WHERE IN THE DEFINITIONS OF LJAC, LMAS AND LE THE 
NUMBER N CAN BE REPLACED BY N-Ml. 

LWORK DECLARED LENGHT OF ARRAY "WORK". 

IWORK INTEGER WORKING SPACE OF LENGHT "LIWORK". 

IWORK(l),IW0RK(2),...,IW0RK(20) SERVE AS PARAMETERS 
FOR THE CODE. FOR STANDARD USE, SET IWORK(1),.., 

IWORK(20) TO ZERO BEFORE CALLING. 

IWORK(21),...,IWORK(LIWORK) SERVE AS WORKING AREA. 
"LIWORK" MUST BE AT LEAST 3*N+20. 

LIWORK DECLARED LENGHT OF ARRAY "IWORK". 

RPAR, IPAR REAL AND INTEGER PARAMETERS (OR PARAMETER ARRAYS) WHICH 
CAN BE USED FOR COMMUNICATION BETWEEN YOUR CALLING 
PROGRAM AND THE FCN, JAC, MAS, SOLOUT SUBROUTINES. 


SOPHISTICATED SETTING OF PARAMETERS 


SEVERAL PARAMETERS OF THE CODE ARE TUNED TO MAKE IT WORK 
WELL. THEY MAY BE DEFINED BY SETTING WORK(l),... 

AS WELL AS IWORK(1),... DIFFERENT FROM ZERO. 

FOR ZERO INPUT, THE CODE CHOOSES DEFAULT VALUES: 

IWORK(1) IF IWORK(1).NE.O, THE CODE TRANSFORMS THE JACOBIAN 
MATRIX TO HESSENBERG FORM. THIS IS PARTICULARLY 
ADVANTAGEOUS FOR LARGE SYSTEMS WITH FULL JACOBIAN. 

IT DOES NOT WORK FOR BANDED JACOBIAN (MLJAC<N) 

AND NOT FOR IMPLICIT SYSTEMS (IMAS=1). 

IWORK(2) THIS IS THE MAXIMAL NUMBER OF ALLOWED STEPS. 

THE DEFAULT VALUE (FOR IWORK(2)=0) IS 100000. 

IWORK(3) THE MAXIMUM NUMBER OF NEWTON ITERATIONS FOR THE 
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c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


SOLUTION OF THE IMPLICIT SYSTEM IN EACH STEP. 

THE DEFAULT VALUE (FOR IW0RK(3)=0) IS 7. 

IWORK(4) IF IWORK(4).EQ.0 THE EXTRAPOLATED COLLOCATION SOLUTION 
IS TAKEN AS STARTING VALUE FOR NEWTON'S METHOD. 

IF IWORK(4).NE.O ZERO STARTING VALUES ARE USED. 

THE LATTER IS RECOMMENDED IF NEWTON'S METHOD HAS 
DIFFICULTIES WITH CONVERGENCE (THIS IS THE CASE WHEN 
NSTEP IS LARGER THAN NACCPT + NREJCT; SEE OUTPUT PARAM.). 
DEFAULT IS IWORK(4)=0. 

THE FOLLOWING 3 PARAMETERS ARE IMPORTANT FOR 
DIFFERENTIAL-ALGEBRAIC SYSTEMS OF INDEX > 1. 

THE FUNCTION-SUBROUTINE SHOULD BE WRITTEN SUCH THAT 
THE INDEX 1,2,3 VARIABLES APPEAR IN THIS ORDER. 

IN ESTIMATING THE ERROR THE INDEX 2 VARIABLES ARE 
MULTIPLIED BY H, THE INDEX 3 VARIABLES BY H**2. 

IWORK(5) DIMENSION OF THE INDEX 1 VARIABLES (MUST BE > 0). FOR 
ODE’S THIS EQUALS THE DIMENSION OF THE SYSTEM. 

DEFAULT IWORK(5)=N. 

IWORK(6) DIMENSION OF THE INDEX 2 VARIABLES. DEFAULT IWORK(6)=0. 

IWORK(7) DIMENSION OF THE INDEX 3 VARIABLES. DEFAULT IWORK(7)=0. 

IWORK(8) SWITCH FOR STEP SIZE STRATEGY 

IF IWORK(8).EQ.1 MOD. PREDICTIVE CONTROLLER (GUSTAFSSON) 

IF IWORK(8).EQ.2 CLASSICAL STEP SIZE CONTROL 
THE DEFAULT VALUE (FOR IWORK(8)=0) IS IWORK(8)=1. 

THE CHOICE IWORK(8).EQ.1 SEEMS TO PRODUCE SAFER RESULTS; 

FOR SIMPLE PROBLEMS, THE CHOICE IWORK(8).EQ.2 PRODUCES 
OFTEN SLIGHTLY FASTER RUNS 

IF THE DIFFERENTIAL SYSTEM HAS THE SPECIAL STRUCTURE THAT 
Y(I)' = Y(I+M2) FOR 1=1,...,Ml, 

WITH Ml A MULTIPLE OF M2, A SUBSTANTIAL GAIN IN COMPUTERTIME 
CAN BE ACHIEVED BY SETTING THE PARAMETERS IWORK(9) AND IWORK(10). 

E.G., FOR SECOND ORDER SYSTEMS P'=V, V’=G(P,V), WHERE P AND V ARE 
VECTORS OF DIMENSION N/2, ONE HAS TO PUT Ml=M2=N/2. 

FOR M1>0 SOME OF THE INPUT PARAMETERS HAVE DIFFERENT MEANINGS: 

- JAC: ONLY THE ELEMENTS OF THE NON-TRIVIAL PART OF THE 

JACOBIAN HAVE TO BE STORED 

IF (MLJAC.EQ.N-M1) THE JACOBIAN IS SUPPOSED TO BE FULL 
DFY(I,J) = PARTIAL F(I+M1) / PARTIAL Y(J) 

FOR 1=1,N-M1 AND J=1,N. 

ELSE, THE JACOBIAN IS BANDED ( Ml = M2 * MM ) 

DFY(I-J+MUJAC+1,J+K*M2) = PARTIAL F(I+M1) / PARTIAL Y(J+K*MS 
FOR 1=1,MLJAC+MUJAC+1 AND J=1,M2 AND K=0,MM. 

- MLJAC: MLJAC=N-M1: IF THE NON-TRIVIAL PART OF THE JACOBIAN IS FULL 

0<=MLJAC<N-M1: IF THE (MM+1) SUBMATRICES (FOR K=0,MM) 

PARTIAL F(I+M1) / PARTIAL Y(J+K*M2), I,J=1,M2 

ARE BANDED, MLJAC IS THE MAXIMAL LOWER BANDWIDTH 
OF THESE MM+1 SUBMATRICES 

- MUJAC: MAXIMAL UPPER BANDWIDTH OF THESE MM+1 SUBMATRICES 

NEED NOT BE DEFINED IF MLJAC=N-M1 

- MAS: IF IMAS=0 THIS MATRIX IS ASSUMED TO BE THE IDENTITY AND 

NEED NOT BE DEFINED. SUPPLY A DUMMY SUBROUTINE IN THIS CASE. 

IT IS ASSUMED THAT ONLY THE ELEMENTS OF RIGHT LOWER BLOCK OF 
DIMENSION N-Ml DIFFER FROM THAT OF THE IDENTITY MATRIX. 

IF (MLMAS.EQ.N-M1) THIS SUBMATRIX IS SUPPOSED TO BE FULL 
AM(I,J) = M(I+M1,J+Ml) FOR I=1,N-M1 AND J=1,N-M1. 
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ELSE, THE MASS MATRIX IS BANDED 
AM(I-J+MUMAS+1,J) = M(I+Mi,J+Ml) 

- MLMAS: MLMAS=N-M1: IF THE NON-TRIVIAL PART OF M IS FULL 

0<=MLMAS<N-M1: LOWER BANDWIDTH OF THE MASS MATRIX 

- MUMAS: UPPER BANDWIDTH OF THE MASS MATRIX 

NEED NOT BE DEFINED IF MLMAS=N-M1 

IWORK(9) THE VALUE OF Ml. DEFAULT M1=0. 

IWORK(IO) THE VALUE OF M2. DEFAULT M2=M1. 


WORK(1) UROUND, THE ROUNDING UNIT, DEFAULT l.D-16. 

WORK(2) THE SAFETY FACTOR IN STEP SIZE PREDICTION, 

DEFAULT 0.9D0. 

WORK(3) DECIDES WHETHER THE JACOBIAN SHOULD BE RECOMPUTED; 

INCREASE WORK(3), TO 0.1 SAY, WHEN JACOBIAN EVALUATIONS 
ARE COSTLY. FOR SMALL SYSTEMS WORK(3) SHOULD BE SMALLER 
(0.001D0, SAY). NEGATIV WORK(3) FORCES THE CODE TO 
COMPUTE THE JACOBIAN AFTER EVERY ACCEPTED STEP. 

DEFAULT 0.001D0. 

WORK(4) STOPPING CRITERION FOR NEWTON’S METHOD, USUALLY CHOSEN <1. 

SMALLER VALUES OF WORK(4) MAKE THE CODE SLOWER, BUT SAFER. 
DEFAULT 0.03D0. 

WORK(5) AND WORK(6) : IF WORK(5) < HNEW/HOLD < WORK(6), THEN THE 
STEP SIZE IS NOT CHANGED. THIS SAVES, TOGETHER WITH A 
LARGE WORK(3), LU-DECOMPOSITIONS AND COMPUTING TIME FOR 
LARGE SYSTEMS. FOR SMALL SYSTEMS ONE MAY HAVE 
WORK(5)=1.DO, WORK(6)=1.2D0, FOR LARGE FULL SYSTEMS 
WORK(5)=0.99D0, WORK(6)=2.DO MIGHT BE GOOD. 

DEFAULTS WORK(5)=1.DO, WORK(6)=1.2D0 . 

WORK(7) MAXIMAL STEP SIZE, DEFAULT XEND-X. 

WORK(8), WORK(9) PARAMETERS FOR STEP SIZE SELECTION 

THE NEW STEP SIZE IS CHOSEN SUBJECT TO THE RESTRICTION 
WORK(8) <= HNEW/HOLD <= WORK(9) 

DEFAULT VALUES: WORK(8)=0.2D0, WORK(9)=8.DO 


OUTPUT PARAMETERS 


X X-VALUE FOR WHICH THE SOLUTION HAS BEEN COMPUTED 

(AFTER SUCCESSFUL RETURN X=XEND). 

Y(N) NUMERICAL SOLUTION AT X 

H PREDICTED STEP SIZE OF THE LAST ACCEPTED STEP 

IDID REPORTS ON SUCCESSFULNESS UPON RETURN: 

IDID= 1 COMPUTATION SUCCESSFUL, 

IDID= 2 COMPUT. SUCCESSFUL (INTERRUPTED BY SOLOUT) 
IDID=-1 INPUT IS NOT CONSISTENT, 

IDID=-2 LARGER NMAX IS NEEDED, 

IDID=-3 STEP SIZE BECOMES TOO SMALL, 

IDID=-4 MATRIX IS REPEATEDLY SINGULAR. 
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c 


c 

IWORK(14) 

NFCN 

c 



c 

IWORK(15) 

NJAC 

c 



c 

IWORK(16) 

NSTEP 

c 

IWORK(17) 

NACCPT 

c 

IWORK(18) 

NREJCT 

c 



c 

IWORK(19) 

NDEC 

c 

IWORK(20) 

NSOL 


c 

c 

c- 


NUMBER OF FUNCTION EVALUATIONS (THOSE FOR NUMERICAL 
EVALUATION OF THE JACOBIAN ARE NOT COUNTED) 

NUMBER OF JACOBIAN EVALUATIONS (EITHER ANALYTICALLY 
OR NUMERICALLY) 

NUMBER OF COMPUTED STEPS 

NUMBER OF ACCEPTED STEPS 

NUMBER OF REJECTED STEPS (DUE TO ERROR TEST), 

(STEP REJECTIONS IN THE FIRST STEP ARE NOT COUNTED) 

NUMBER OF LU-DECOMPOSITIONS OF BOTH MATRICES 
NUMBER OF FORWARD-BACKWARD SUBSTITUTIONS, OF BOTH 
SYSTEMS; THE NSTEP FORWARD-BACKWARD SUBSTITUTIONS, 
NEEDED FOR STEP SIZE SELECTION, ARE NOT COUNTED 


Subroutine RADAUP 


With the option iwork(ii) = 3 this code is mathematically equivalent to RADAU5. 
The only difference is that explicit sums have been replaced by loops, and that the 
coefficients of the method have been put into arrays. This makes the code a little bit 
slower (in particular for small problems), but has the advantage that the coefficients 
of the method can be easily changed. At the moment, the coefficients of the Radau 
IIA methods of orders 5,9, and 13 are available by setting iwork(ii) equal to 3, 
5, and 7, respectively. The calling list is the same as for RADAU5. 

SUBROUTINE RADAUP(N,FCN,X,Y,XEND,H, 

+ RTOL,ATOL,ITOL, 

+ JAC ,IJAC,MLJAC,MUJAC, 

+ MAS ,IMAS,MLMAS,MUMAS, 

+ SOLOUT,IOUT, 

+ WORK,LWORK,IWORK,LIWORK,RPAR,IPAR,IDID) 


Subroutine RODAS 


This is an implementation of the Rosenbrock method described in Section VI.3. It 
also satisfies the algebraic order conditions and can thus be applied to differential- 
algebraic problems of index 1. The calling list is: 


+ 


SUBROUTINE RODAS(N,FCN,IFCN,X,Y,XEND,H, 

RTOL,ATOL,ITOL, 

JAC ,IJAC,MLJAC,MUJAC,DFX,IDFX, 

MAS ,IMAS,MLMAS,MUMAS, 

SOLOUT,IOUT, 

WORK,LWORK,IWORK,LIWORK,RPAR,IPAR,IDID) 


Compared to RADAU5 we have three additional parameters, ifcn indicates whether 
the right-hand side f(x,y) of the problem (A.l) is independent of x or not. In 
the case that / depends on x , the code needs the partial drivative df/dx. This 
can be provided numerically (set idfx = o and supply a dummy subroutine for dfx) 
or analytically. In the latter case, one has to set idfx = l and one has to supply a 
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subroutine computing df/dx. Of course, the meaning of the work and iwork pa¬ 
rameters are not all the same as for RADAU5. They are decscibed in the comments 
of the code. 


Subroutine SEULEX 


This is an extrapolation code based on the linearly implicit Euler method (Sections 
IV.9 and VI.4). A dense output has been included in cooperation with A. Oster- 
mann. The meaning of the input parameters is the same as for RODAS. The work 
and iwork parameters are decscibed in the comments of the code. 


+ 

+ 

+ 

+ 

+ 


SUBROUTINE SEULEX(N,FCN,IFCN,X,Y,XEND,H, 

RTOL,ATOL,ITOL, 

JAC ,IJAC,MLJAC,MUJAC, 

MAS ,IMAS,MLMAS,MUMAS, 

SOLOUT,IOUT, 

WORK,LWORK,IWORK,LIWORK,RPAR,IPAR,IDID) 


Problems with Special Structure 

If the first m 1 equations of (A.l) are of the form 

y'i=yi+m 2 for i = (A.2) 

with m 1 being an integer multiple of m 2 , and the remaining equations do not de¬ 
pend explicitly on y f +1 ,..., y f n , it is recommended to set the parameters iwork (9) 
and iwork ( 10 ) equal to m 1 and m 2 , respectively. This implies a more efficient 
treatment of the arising linear systems and is, in particular, advantageous for a 
large value of m r 

If iwork (9) is set to a nonzero value, care has to be taken with the definition of 
the subroutines jac and mas. Only the nontrivial part of the Jacobian (i.e., the rows 
with indices m 1 + 1,..., n) have to be computed and stored in an array of dimen¬ 
sion (n — m 1 ) x n . Similarly, only the right lower block (of dimension n — m 1 ) 
of the matrix M has to be defined in the subroutine mas. However, the subroutine 
fcn must contain the definition of all components of f(x, y) , in particular also the 
statement f(i) = y(i+M2) for i=i,... ,mi. Banded options are still possible. Typical 
situations, where (A.2) arises, are the following: 

y n — f(x, y, y f ). With the new variable z = y f the system becomes 

y — z 

Z* = f{x,y,z), 

which is of the form (A.l). If y G R m , both parameters iwork( 9) and iwork do) 
have to be set equal to m . Banded option can be used, if both df/dy and df /dy f 
are banded. 



576 Appendix. Fortran Codes 


C(x , y)y f = f(x , y) . Again we introduce z = y', so that this problem becomes 
equivalent to 

y' = z 

0 = C(x,y)z-f(x,y). 

Both parameters iwork(9) and iwork(io) have to be set equal to the dimension of y . 
If only a few components of y' are multiplied by non-constant terms, then it may 
be more efficient to introduce new variables only for these components. 

C(x , y)y ff = fix, y, y f ). With the new variables z = y' and u = z f — y" , 
this problem can be written in the form (A.l) as follows 



o = C{x,y)u- f(x,y, z). 

Here m 2 is equal to the dimension of y , and m 1 = 2m 2 . 


Use of SOLOUT and of Dense Output 

The subroutine SOLOUT, supplied by the user, is called after every accepted step 
and provides the solution over the whole step (dense output). This possibility can 
be used for tabulating the solution at prescribed output points (see the driver for 
RADAU5 above) or for graphical presentation of the solution. Further applications 
are the following: 

Event location. Suppose we want to determine x such that g(x,y(x)) =0, where 
y(x) is the solution of (A.l). During integration one can check in the subrou¬ 
tine SOLOUT whether the values g(x i _ 1 ,y i _ 1 ) and g(x i ,y i ) change sign. If this 
occurs, the dense output (which is available for all of our codes) can be used to 
localize the zero of g(x, y(x )). This procedure is very useful for problems with 
discontinuous right-hand side (see Sect. II.6). 

Projection. An efficient way for solving higher index differential-algebraic equa¬ 
tions is index-reduction combinded with projection. If one applies a stiff (or non¬ 
stiff) code straightforwardly to an index-reduced problem, the obtained numerical 
solution will suffer from the so-called “drift-off” effect. In order to avoid this drift- 
off, it is recommended to project the numerical solution after every step onto the 
solution manifold of the problem. This can be conveniently done with help of the 
subroutine SOLOUT. 
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Lagrange polynomial, 499. 

Lagrange function, 13,463. 

Laguerre polynomial, 96. 

set of labelled trees of order q, 106. 

projection, 494. 

differentiation order, 315. 

perturbation index, 459. 

interpolation order, 315. 

(shifted) Legendre polynomial, 78, 202. 
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Q 

Qiv, C) 

R kj( z ) 
R(z) 

r jM 

r(C>A*) 

5 

C seal 

S(Z) 

S a 

S(n) 

T 

T 

T m (z) 

TW 
T(ri, 0 
U 

IMId 

IIMIId 

III«IIIg 

IMIg 

a o(^ _1 ) 

S D ( X ) 

5 h (z) 

Sj(x) 

$lm( x ) 

$ol ( x ) 
n(A) 

KO 

V 

Q 

ei*) 

e(0 

CT (C) 

<Pb(0 

<Pr( x ) 

V 


projection, 494. 

characteristic polynomial, 282, 291. 

Pade approximation, 48. 

stability function, 16, 40, 41, 108, 132. 

coefficient of discrete resolvent, 332, 353, 385. 

discrete resolvent, 332, 353. 

stability domain, 16, 241. 

scaled stability domain, 60. 

stability matrix, 353. 

sector of A (a) -stability, 250. 

stability matrix, 290. 

kinetic energy, 463, 531. 

set of trees, 116. 

Chebyshev polynomial, 31. 
set of trees for W -methods, 115. 
property T, 81. 
potential energy, 463, 533. 
norm, 218. 

norm in product space, 216, 218. 
norm in product space, 330. 
inner product norm, 307, 356. 
coercivity coefficient, 215. 
coercivity coefficient, 215. 
differentiation error, 314. 
local error, 226, 227, 228, 323. 
interpolation error, 314. 
linear multistep error, 322. 
one-leg error, 314. 
logarithmic norm, 168. 
multiplier, 343. 

one-sided Lipschitz constant, 180, 215, 305, 339. 

threshold factor, 176. 

order of a tree, 410, 508. 

generating polynomial, 240. 

generating polynomial, 240. 

error growth function, 193. 

error growth function (linear problems), 169. 

backward difference operator, 242. 
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A -acceptable approximations, 43. 

A -stability 

of multistep methods, 241. 
of one-step methods, 42f. 
of Pade approximations, 58. 
of rational approximations, 56f. 
of SDIRK methods, 97. 
via positive functions, 87. 

A(0) -stable multistep methods, 250. 

Ao -stable multistep methods, 251. 

A(c*)-stability 
of BDF methods, 251. 
of blended methods, 267. 
of Enright methods, 263. 
of extrapolation methods, 137, 139. 
of modified EBDF methods, 270. 
of multistep methods, 250. 
of multistep Radau methods, 276. 
of RK methods, 45. 

of second derivative BDF methods, 265. 
A(a) -stable multistep methods of high or¬ 
der, 25 If. 

absolutely monotonic functions, 178. 
acceleration level, 465. 
accuracy barriers for linear multistep meth¬ 
ods, 254f. 

Adams methods, 242f, 249, 266. 
adjoint differential equation, 462, 467. 
algebraic criterion for G -stability, 309. 
algebraic stability, 

of general linear methods, 356f. 
of multivalue methods, 366f. 
of RK methods, 181f, 188, 206, 232. 
amplifier, 376f, 379. 

Andrews’ squeezer mechanism, 530f. 

AN -stability, 
of RK methods 184f, 200. 
of general linear methods, 360. 
asymptotic expansions, 135, 428f, 433, 525f. 


asymptotic solution 

of van der Pol’s equation, 372. 
automatic stiffness detection, 21. 

backward differentiation formulas, see BDF 
backward error analysis 
for ODEs, 555f. 
on manifolds, 559f. 

Bader-Deuflhard method, 134f. 

Baumgarte stabilization, 470. 

B -convergence, 225. 
of G -stable one-leg methods, 316. 
of multistep methods, 368f. 
of order r, 231. 
of RK methods, 225f. 
of trapezoidal rule, 234. 
of variable step sizes, 230. 

BDF methods, 2-3, 239, 246, 259, 266, 280, 
285, 296, 308, 477, 481, 528, 538. 
BEAM, 146, 153, 155f, 159, 300, 302. 
beam equation, 8f, Ilf, 20, 38f, 46, 146. 
BECKDO, 149f, 152, 155f, 300. 
Becker-Doring model, 149f. 

Bernstein’s inequality, 324. 
j3 -blocked multistep methods, 527. 
blended multistep methods, 266. 
boundary layer terms, 389. 

BRUSS, 148, 155f, 159f, 300, 302. 
Brusselator, 6, 19, 31, 148. 

BRUSS-2D, 151f, 157f, 160, 300. 

B -stability 
of Radau DA, 199. 
of RK methods, 180f, 188, 201. 
of Rosenbrock methods, 200. 

Burgers equation, 349f, 443f, 448. 

Cary Grant’s part, 62. 

Cash’s algorithm, 268. 
characteristic equation 
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for general linear methods, 291. 
for linear multistep methods, 240. 
for multistep RK methods, 282. 
for predictor-corrector schemes, 244. 
characterization 

of algebraically stable methods, 209. 
of positive quadrature formulas, 205. 
Chebyshev method, 3If. 

of second order, 34f. 

Chebyshev polynomial, 3If. 
chemical reactions, 3. 

Christoffel-Darboux formula, 130. 
circuits, 4, 376, 379. 
coercivity coefficient 215, 368. 
collocation methods 
for index 2 DAE, 498. 
multi-step, 270f. 
one-step, 47, 78. 
projected, 503. 
singly implicit, 129. 
companion matrix, 323. 
comparing stability domains, 58. 
comparison 

between Chebyshev methods, 160. 
between extrapolation methods, 159f. 
between IRK methods, 158f. 
between Radau codes, 158f. 
between Rosenbrock codes, 158f. 
composite multistep methods, 267. 
composition methods 50, 554f. 
consistent initial values 
for index 1, 374, 378. 
for index 2, 456. 
for mechanical systems, 535. 
constrained mechanical system, 464,469f, 477 
524, 543. 

construction of IRK methods, 83. 
continued fraction representation, 50, 85. 
continued fractions related to quadrature for¬ 
mulas, 201f. 

continuous solution, see ‘dense output’ 
contractivity 

for linear problems, 167f. 
in general norms, 175. 
see also ‘ B -stability’ 
control problems, 46If. 
convergence 

for linear problems, 321f. 
for nonlinear problems, 339f. 
of A-stable multistep methods, 317f. 
of BDF for index 2, 486. 


of DAE Rosenbrock methods, 416f. 
of half-explicit RK methods, 521. 
of multistep methods for index 2, 489. 
of multistep methods for SPP, 383f. 
of RK for index 1, 380. 
of RK for index 2 DAE, 496f, 504. 
of RK methods for DAE, 394f. 
of RK methods for SPP, 402. 
of symplectic methods, 547, 549. 
see also ‘ B -convergence’ 
coordinate partitioning, 476, 478f. 
counter-examples 
for existence, 217. 
for index definitions, 460f. 
for stability properties, 199. 
criterion for G -stability, 309. 

CUSP, 147, 300, 302. 
cusp catastrophe, 147. 

DAE, 373, 451. 

overdetermined, 477. 

Dahlquist’s first barrier, 299. 

Dahlquist’s second barrier, 247, 286, 297, 299. 
Dahlquist’s test equation, 16, 240. 
damped Chebyshev methods, 32f. 
Daniel-Moore conjecture, 51, 286, 294, 298, 
364. 

DASSL, 481,538, 541. 

DEABM, 5, 6. 

DEBDF, 30If. 
dense output, 576. 

of DAE extrapolation methods, 438f. 
of DAE Rosenbrock methods, 422. 
of Enright methods, 263f. 

, of multistep collocation methods, 272. 

of SDIRK4, 100. 
derivative feedback (D), 28. 
derivative array equations, 478. 
descriptor form, 464. 
diagonally implicit RK methods, 9If. 
difference-corrected BDF, 528. 
differential-algebraic equations, see DAE. 
differential equations 
linear, 167, 321. 
nonlinear, 180, 339. 
of singular perturbation type, 37If. 
on manifolds, 457, 474f, 544. 
perturbed, 556. 
quasilinear, 442, 576. 
second order, 575. 
stiff, 2f. 
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with invariants, 472f. 
differentiation index, 455, 478. 
differentiation error, 314. 

order, 315, 319. 
diffusion, 6. 

DIRK, 61, 9If, 208, 221. 
disc theorem, 58, 254. 
discrete resolvent, 332. 
discrete variation of constants formula, 332, 
348f. 

DJ -reducible RK methods, 187. 
dominant invariant subspace, 161. 

DOPRI5, 3, 19, 22f, 25f, 30, 143, 153f, 469, 
471. 

for mechanical system, 537. 

DOP853* Ilf, 18, 20, 26, 29. 

for mechanical system, 537. 

Dormand & Prince methods, 27. 

Dorodnicyn’s asymptotic formula, 374. 
drift-off phenomenon, 468f. 
dual order stars, 295. 

DUMKA, 34f. 

efficiency diagram, 154f, 159f, 301 f, 537, 539. 
EKBWH-method, 163f. 
elastic beam, 146. 
electrical circuits, 4, 376, 379. 
elementary differentials, 106. 
for index 1 DAE, 410. 
for index 2 DAE, 508. 
embedded formula for RADAU5, 123. 

Enright & Kamel method, 163f. 

Enright methods, 261f, 266, 275f. 

E -polynomial, 43, 96f. 

for Pade approximation, 70. 
e -embedding method, 374, 382, 407, 426. 
e -expansions for SPP 
for exact solution, 388. 
for RK solution, 392f. 
equivalence 

between stability concepts, 186, 188. 
of A and B stability, 211. 
of A and G -stability, 31 Of. 
error 

local, 226, 228f, 405, 494. 
global 226, 321, 328, 399, 403f. 
error bounds for one-leg methods, 314f. 
error constant, 247, 286f. 

of rational approximations, 42, 52, 61, 67. 
of second derivative multistep methods, 262. 
for SDBDF methods, 265. 


error growth function, 193f, 200, 229. 
for linear problems, 169f. 
superexponential, 171, 194. 
error propagation, 229. 

Euler equations, 463. 

Euler’s method 2, 15, 45, 58. 
explicit, 2, 15, 556. 
half-explicit, 519, 525. 
implicit, 3, 45, 169, 247, 491, 557. 
symplectic, 545, 557. 

Euler’s polyhedral formula, 57. 

EULSIM, 140, 160. 
existence 

of multistep solutions, 306f, 482. 
of numerical RK solutions, 215f, 397, 521, 
546. 

expansion of SPP solutions, 388f. 
experiments with multistep codes, 300. 
explicit 

Adams methods, 242f. 

Euler method, 2, 15. 

Runge-Kutta methods, 16. 
midpoint rule, 245, 249. 

Nystrom methods, 245. 
exponential fitting points, 56. 
extended BDF methods, 267. 
extended multistep methods, 267f. 
extrapolation methods, 18, 131. 
for index 1 DAE, 426f. 
for quasilinear DAE, 447. 

GBS, 18. 

E5, 145, 153f, 300f. 

first integral, 472 
Fortran codes, 565. 

Fourier transform, 148, 255. 
fast (FFT), 149, 157. 

Gauss methods, 71, 181, 184, 198, 200, 220, 
226, 504. 

Gaussian quadrature formulas, 202. 

Gear & Saad method, 161f. 
general linear methods, 290f. 

algebraic stability of, 356f. 
generalized multistep methods, 261. 
generating polynomials, 240. 

GGL formulation of mechanical system, 465, 
478. 

global error, 226. 
expansion for SPP, 399. 
for Prothero & Robinson problem, 328. 



610 Subject Index 


of linear multistep methods, 321. 
of one-leg methods, 322. 

Graeco-Roman transformation, 256. 

Green’s function, 9. 

GRK4A, 110. 

Gronwall lemma, 460. 

G -stability, 

of one-leg methods, 307f. 
of BDF2 method, 308, 312. 
of general linear methods, 356. 

half-explicit methods, 519f. 
extrapolation methods, 525. 
multistep methods, 527. 

Runge-Kutta methods, 520. 

Hamiltonian function, 473, 543. 

perturbed, 558. 

Hamiltonian systems, 472f. 
constrained, 543f. 
perturbed, 558. 
hanging rope, 13f. 

HEM5, 538. 

Hermite interpolation, 271. 

Hessenberg form, 122. 

HEX5, 538. 
hidden manifold, 454. 

high order A(a)-stable multistep methods, 
25 If. 

high oscillations, 11. 

HIHA5, method of Higham & Hall, 26f. 
HIRES, 144f, 152f, 159f, 300f. 

HLR89, 459 
hump, 113, 405. 
hybrid multistep methods, 267. 
hyperbolic problems, 37, 51. 

implementation 

of extrapolation schemes, 139f. 
of IRK methods, 118f. 
of Rosenbrock methods. 111. 
implicit 

Adams methods, 243. 

Euler method, 3, 45, 169, 247, 491. 
midpoint rule, 131, 306. 

Milne-Simpson methods, 245, 249. 

RK methods, 40f, 7If. 
implicit differential equations 

Mu' = ip{u), 103, 127, 141, 376, 378f, 
408, 426. 

M(u)u'=<p(u ), 442f, 460, 576. 

F(u',u) = 0,452, 459,478. 


inconsistent initial values 

for DAE Rosenbrock methods, 422f. 
index, 452f. 
differentiation, 454f. 

index 1, 371f, 374, 445, 455, 459, 465, 
537. 

index 2, 456, 458, 460, 464, 519, 537. 
index 3, 456, 458, 464, 537. 
of nilpotency, 454. 
perturbation, 459. 
index reduction, 468f. 
inexact Jacobian, 114. 
influence of perturbations, 218, 484, 493. 
integral feedback (I), 28. 
interpolation error, 314. 

order, 315, 319. 
invariants, 472. 

IRK(DAE), 376. 
irreducible RK methods, 187. 

/ -stability, 43. 

Jeltsch-Nevanlinna theorem, 60, 289. 

kinetic energy, 8f, 463, 531. 

of mechanical systems, 531, 541. 
Kirchhoff’s law, 376. 

Kreiss matrix theorem, 323. 

Kreiss problem, 542. 

KS, 148f, 300, 302. 

Kuramoto-Sivashinsky equation, 148. 
Kuntzmann-Butcher methods, 42f, 71. 

labelled trees, 105,411,509. 

LADAMS, 301 f, 304. 

Lagrange multipliers, 196f, 464. 

Lagrange theory, 8, 13, 463. 
Lagrange-Hamilton principle, 463. 

Laguerre polynomials 96, 129f. 

Lebedev’s realization, 33. 

Legendre polynomials, 71, 78, 202. 

LIMEX, 448. 
linear problems 
contractivity, 167f. 
index, 452f, 455. 
linearly implicit 
Euler method, 138f. 

Euler for index 1 DAE, 426f. 

Euler for quasilinear DAE, 448. 
midpoint rule, 134f, 441. 

RK method, 102. 

Lipschitz constant, 23. 
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one-sided, 180. 

Lobatto IIIA methods, 42f, 75f, 185, 211, 

222, 226, 504. 

Lobatto IIIA-IIIB pair, 549f, 563. 

Lobatto MB methods, 75f, 185, 211, 222, 
226. 

Lobatto MC methods, 75f, 184, 198, 220, 

223, 226, 403f, 504. 
local coordinates, 475. 

local error, 226, 228f, 485, 494. 
local state space form, 474. 
logarithmic norm 168, 390. 

LSODE, 143, 153f, 300f. 

LSODI, 481. 

L -stability, 44. 
ofSDIRK methods, 98. 

manifold, 457. 
matrix pencil, 452, 466. 

MEBDF, 303f. 

mechanical system, 463, 530f. 

METAN1, 140. 
metastability, 150. 

MEXX for mechanical system, 538. 
midpoint rule, 245, 249. 

Milne-Simpson methods, 245, 249. 
monotonically labelled trees, 105, 411, 509. 
Montaigne’s ruff, 287. 
moving finite elements, 442f. 
multibody mechanisms, 530. 
multiderivative multistep methods, 282. 
multiple real-pole approximations, 67, 98f. 
multiplier, 342f. 

and nonlinearities, 346. 
construction of, 344f. 
multistep collocation methods, 270f. 
as general linear method, 272. 

G -stability of, 361. 
multistep methods, 239f. 

P -blocked, 527. 

for index 1, 382f. 

for index 2, 481. 

for quasilinear DAE, 446f. 

of Radau type, 273. 

multistep Runge-Kutta methods, 281, 362. 
multistep twin, 306. 

Navier-Stokes equations, 351. 
non-autonomous ODE, 103, 141,408. 
nonlinear perturbations, 172. 
number of positive weights of QF, 203f. 


numerical experiments, 143, 300, 403f, 536f. 
numerical work and poles, 283. 

Nystrom methods, 245. 

ODE, see differential equations. 

ODEX, 6, 7. 

one-leg multistep methods, 305f. 
error bounds for, 314. 

one-sided Lipschitz condition, 180f, 215, 305, 
339, 356. 

one-sided Lipschitz constant, 180. 
one-step methods, If. 
optimal control problems, 46If, 467. 
optimal stability regions, 3If. 
order conditions 

for DAE Rosenbrock methods, 415. 
for index 2 DAE, 506f, 512, 523. 
for Rosenbrock methods, 104f. 
for SDIRK methods, 9If., 
for second derivative multistep methods, 
261. 

order of a tree, 410, 508. 
order of B -convergence, 231. 
order of a quadrature formula, 202. 
order reduction, 225. 

for Rosenbrock methods, 236. 
order stars, 5If. 
dual, 295. 
for BDF2, 285. 

for general linear methods, 290. 
for multistep methods, 279, 284f. 
for one-step methods, 51. 
for Pade approximations, 53. 
for SDIRK methods, 55, 101. 
relative, 59, 69, 287. 
order tableau 

for DAE extrapolation methods, 43 If, 441. 
OREGO, 144, 152f, 159, 300f. 

Oregonator, 13. 
overdetermined DAE, 477. 

Pade approximations to e z , 48f, 170. 
parabolic problems, 3If, 349f. 

Parseval identity, 255, 259. 
partitioned Rosenbrock methods, 425. 
partitioning methods, 160. 

Peano kernel, 254f. 
pendulum, 463f, 468, 474. 
perturbation index, 459. 
perturbations 

of linear equations, 348. 
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of RK solutions, 219, 398. 
perturbed asymptotic expansions, 428f, 434, 
448. 

perturbed differential equation, 556. 
perturbed Hamiltonian system, 558. 
PHEM56, 538. 

PI step size control, 28. 

PLATE, 146, 152f, 300f. 

plate differential equation, 146. 

poles representing numerical work, 283. 

position level, 464. 

positive functions, 86f, 313. 

positive quadrature formulas, 183, 201, 205. 

potential energy, 8f, 463, 533. 

of mechanical systems, 533, 541. 
preconsistency, 359. 
predictive controller, 124. 
predictor-corrector schemes, 244. 
principal root, 285. 
principal sheet, 285, 292. 
projected collocation methods, 503. 
projected Runge-Kutta methods, 502, 515f. 
projection methods, 160. 
for DAE, 470f. 

for ODEs with invariants, 473. 
projections (index 2), 487, 494f. 
property C , 288f. 
property T, 81. 
proportional feedback (P), 28. 
Prothero-Robinson problem, 153, 225, 328, 
All. 

quasilinear differential equation, 442f, 576. 
index 1, 445. 

Radau IA, 72, 184, 220, 226, 403f, 504. 
Radau IIA, 74, 184, 197, 220, 226, 403f, 504. 
Radau methods of multistep type, 273. 
RADAUP, 158f, 574. 

RADAU5, 4f, 46, 118f, 143, 153f, 379, 566f. 

for mechanical system, 539, 541. 
rational approximations with real poles, 61. 
RATTLE, 548f. 
real-pole sandwich, 62. 
red-black reduction, 165. 
reduced system, 372, 374, 388. 
reducible RK methods, 187f. 
region of absolute stability, see ‘stability do¬ 
main’ 

region of step-control stability, 26f. 
regular matrix pencil, 452, 466. 


relative order star, 59, 69, 287. 

relative separation, 161. 

resolvent (discrete), 332. 

Riemann surfaces, 279f. 

RKC, 36, 143, 153f. 

RKF4(5), 25. 

RKF5(4), 24, 26. 

ROBER, 144, 152f, 159, 300f. 

Robertson reaction, 3, 18, 144. 

RODAS, 143, 153f, 158f, 420f, 574. 

RODAS5, 143, 158f, 422. 

root locus curve, 24If. 
for BDF methods, 246. 
for Enright methods, 263. 
for explicit Adams methods, 243. 
for implicit Adams methods, 243. 
for Milne-Simpson methods, 245. 
for Nystrom methods, 245f. 
for SDBDF methods, 265. 

ROS4, 143. 

Rosenbrock methods, 172f. 
comparisons, 158f. 
contractivity, 172f. 
for stiff problems, 102, 102f. 
for DAE, 407f, 447. 
order reduction, 236. 
with inexact Jacobian, 114. 

rotation number, 204. 

Routh criterion, 89. 

Runge-Kutta methods 
explicit, 16. 

for index 1 problems, 375. 
for index 2 DAE, 492f. 
for quasilinear DAE, 446f. 
for SPP, 392f. 
half-explicit, 520. 
implicit, 40f, 7If. 
projected, 502, 515f. 

savings in linear algebra, 540. 

scaled stability domain, 60. 

Schur’s criterion, 278. 

SC-stability, 24f. 

for Dormand & Prince methods, 27. 

SDBDF, 265. 

SDIRK code, 128. 

SDIRK method, 42, 44, 91, 183, 208, 403, 
504. 

SDIRK4, 100, 143, 158f. 

SECDER, 303f. 

second Dahlquist barrier, 247, 254. 
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second derivative BDF methods, 265. 
second derivative multistep methods, 261. 
separably stiff problems, 161. 

SEULEX, 140, 143, 153f, 160, 575. 

SHAKE, 548. 

simplified Newton, 119f, 490. 
simplifying assumptions, 71, 80f, 183, 206f, 
363. 

for index 2 DAE, 514. 
singly diagonally implicit RK methods, 91. 
singly implicit RK methods, 128f. 
singular perturbation problems, 37If, 433. 
SIRK-methods, 128f. 
smoothing step for extrapolation, 133. 
SODEX, 140, 143, 160. 

SOLOUT, 576. 

SPP, see singular perturbation problems. 
SPRINT, 301f, 304, 481. 

S -reducible RK methods, 188. 
stability analysis 

for Euler’s method, 15. 
for explicit RK methods, 16f. 
for modified EBDF methods, 269. 
for multistep methods, 240f. 
for multistep Radau methods, 274f. 
for multistep Runge-Kutta methods, 28 If. 
stability domain, 16. 
cross-shaped 39. 
of Bader-Deuflhard method, 134. 
of BDF methods, 246. 
of modified EBDF methods, 270. 
of Chebyshev methods, 32f. 
of DOPRI methods, 17. 
of Enright methods, 263. 
of ERK methods, 17. 
of explicit Adams methods, 243. 
of extrapolated Euler, 139. 
of extrapolated trapezoidal rule, 132. 
of GBS extrapolation, 19. 
of implicit Adams methods, 243. 
of implicit Euler method, 246. 
of Milne-Simpson methods, 246. 
of multistep methods, 240f. 
of multistep Radau methods, 276. 
of Nystrom methods, 246. 
of Pade approximations, 52. 
of predictor-corrector schemes, 245. 
stability function R(z), 16, 84. 
of Chebyshev methods, 32f. 
of collocation method, 47. 
of DIRK methods, 61. 


of DOPRI5, 17, 26. 
of DOP853, 18. 

of extrapolation methods, 132f. 
of IRK methods, 40, 84. 
of order > s, 47. 
of Rosenbrock methods, 108. 
of SDIRK methods, 67, 96f. 
stability function for y = A {x)y 
of IRK methods, 184f. 
stability region, see stability domain, 
stabilization 
Baumgarte, 470. 
by projection, 470. 
velocity, 47If. 

stabilized explicit methods, 3If. 

stage order, 226, 369. 

starting values for Newton iteration, 120. 

state space form, 374f, 474. 

state space form method, 375f, 383. 

step size selection, 123f. 

predictive, 124. 
step-control stability, 24f. 
stiff, If. 

stiff eigenvalues ,161. 

stiff eigenvectors ,161. 

stiff mechanical system, 541. 

stiff stability of multistep methods, 250. 

stiff-detest, 144. 

stiffly accurate, 227, 552. 

RK methods,45, 376. 

Rosenbrock methods, 418f. 

SDIRK methods, 92f. 
stiffness, 2, 151. 

detection, 21. 
stopping criterion, 120. 

for Enright & Kamel method, 164. 
STRIDE, 129. 

Sullivan, Leon, 9. 
superconvergence, 500, 554. 
superexponential, 171, 194. 
super-future point, 267. 
symplecticity, 544, 547. 
symplectic methods, 543f. 

Euler, 545, 561. 

Lobatto IIIA-IIIB, 550, 563. 
second order, 548f, 558, 56If. 

tangent space parametrization, 476. 
Taylor expansion 

for index 2 DAE, 508f. 

of DAE Rosenbrock solution, 412f. 
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of DAE solutions, 411. 
of index 2 RK solution, 51 Of. 

Taylor series method, 261. 

Tchebychef, see Chebyshev. 

test problems, 144f. 

theorem of von Neumann, 168, 330. 

9 -method, 42, 50. 
threshold factor, 176, 179. 
transient phase, 2. 
transistor amplifier, 376f, 379. 
trapezoidal rule, 45, 131, 185, 234, 247, 306, 
357. 

trees 

for ODE, 92, 105. 

for index 1 DAE, 409f. 

for index 2 DAE, 507. 

for VP -methods, 115. 

monotonically labelled, 105, 411, 509. 

underlying ODE, 455, 478. 
uniqueness 

of multistep solutions, 306f, 482. 


of RK solutions, 219, 397. 

van der Houwen & Sommeijer’s approach, 
35. 

van der Pol’s equation, 4-5, 144, 372, 403, 
406, 566. 

Vandermonde matrix, 78. 

VDPOL, 144, 153f, 159, 300f. 
velocity level, 464. 
velocity stabilization, 471. 

VODE, 301 f. 

Volterra-Lotka model, 556. 
von Neumann’s theorem, 168, 330. 

V -transformation, 78. 

VP-methods, 114, 136. 
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Preface to the First Edition 


They throw geometry out the door, and it comes back through the win¬ 
dow. 

(H.G.Forder, Auckland 1973 , reading new mathematics at the age of 84 ) 

The subject of this book is numerical methods that preserve geometric properties of 
the flow of a differential equation: symplectic integrators for Hamiltonian systems, 
symmetric integrators for reversible systems, methods preserving first integrals and 
numerical methods on manifolds, including Lie group methods and integrators for 
constrained mechanical systems, and methods for problems with highly oscillatory 
solutions. Structure preservation - with its questions as to where, how, and what for 
- is the unifying theme. 

In the last few decades, the theory of numerical methods for general (non-stiff 
and stiff) ordinary differential equations has reached a certain maturity, and excel¬ 
lent general-purpose codes, mainly based on Runge-Kutta methods or linear mul¬ 
tistep methods, have become available. The motivation for developing structure¬ 
preserving algorithms for special classes of problems came independently from such 
different areas of research as astronomy, molecular dynamics, mechanics, theoreti¬ 
cal physics, and numerical analysis as well as from other areas of both applied and 
pure mathematics. It turned out that the preservation of geometric properties of the 
flow not only produces an improved qualitative behaviour, but also allows for a more 
accurate long-time integration than with general-purpose methods. 

An important shift of view-point came about by ceasing to concentrate on the 
numerical approximation of a single solution trajectory and instead to consider a 
numerical method as a discrete dynamical system which approximates the flow of 
the differential equation - and so the geometry of phase space comes back again 
through the window. This view allows a clear understanding of the preservation of 
invariants and of methods on manifolds, of symmetry and reversibility of methods, 
and of the symplecticity of methods and various generalizations. These subjects are 
presented in Chapters IV through VII of this book. Chapters I through III are of an 
introductory nature and present examples and numerical integrators together with 
important parts of the classical order theories and their recent extensions. Chapter 
VIII deals with questions of numerical implementations and numerical merits of the 
various methods. 

It remains to explain the relationship between geometric properties of the nu¬ 
merical method and the favourable error propagation in long-time integrations. This 
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Geometric integrators 


Backward error analysis 
-?-— 


Long-time errors 


is done using the idea of backward error analysis , where the numerical one-step 
map is interpreted as (almost) the flow of a modified differential equation, which is 
constructed as an asymptotic series (Chapter IX). In this way, geometric properties 
of the numerical integrator translate into structure preservation on the level of the 
modified equations. Much insight and rigorous error estimates over long time in¬ 
tervals can then be obtained by combining this backward error analysis with KAM 
theory and related perturbation theories. This is explained in Chapters X through 
XII for Hamiltonian and reversible systems. The final Chapters XIII and XIV treat 
the numerical solution of differential equations with high-frequency oscillations and 
the long-time dynamics of multistep methods, respectively. 

This book grew out of the lecture notes of a course given by Ernst Hairer at 
the University of Geneva during the academic year 1998 / 99 . These lectures were 
directed at students in the third and fourth year. The reactions of students as well 
as of many colleagues, who obtained the notes from the Web, encouraged us to 
elaborate our ideas to produce the present monograph. 

We want to thank all those who have helped and encouraged us to prepare this 
book. In particular, Martin Hairer for his valuable help in installing computers and 
his expertise in Latex and Postscript, Jeff Cash and Robert Chan for reading the 
whole text and correcting countless scientific obscurities and linguistic errors, Haruo 
Yoshida for making many valuable suggestions, Stephane Cirilli for preparing the 
files for all the photographs, and Bernard Dudez, the irreplaceable director of the 
mathematics library in Geneva. We are also grateful to many friends and colleagues 
for reading parts of the manuscript and for valuable remarks and discussions, in 
particular to Assyr Abdulle, Melanie Beck, Sergio Blanes, John Butcher, Mari Paz 
Calvo, Begona Cano, Philippe Chartier, David Cohen, Peter Deuflhard, Stig Faltin- 
sen, Francesco Fasso, Martin Gander, Marlis Hochbruck, Bulent Karasozen, Wil¬ 
helm Kaup, Ben Leimkuhler, Pierre Leone, Frank Loose, Katina Lorenz, Robert 
McLachlan, Ander Murua, Alexander Ostermann, Truong Linh Pham, Sebastian 
Reich, Chus Sanz-Sema, Zaijiu Shang, Yifa Tang, Matt West, Will Wright. 

We are especially grateful to Thanh-Ha Le Thi and Dr. Martin Peters from 
Springer-Verlag Heidelberg for assistance, in particular for their help in getting most 
of the original photographs from the Oberwolfach Archive and from Springer New 
York, and for clarifying doubts concerning the copyright. 


Geneva and Tubingen, November 2001 


The Authors 






Preface to the Second Edition 


The fast development of the subject - and the fast development of the sales of the 
first edition of this book - has given the authors the opportunity to prepare this sec¬ 
ond edition. First of all we have corrected several misprints and minor errors which 
we have discovered or which have been kindly communicated to us by several read¬ 
ers and colleagues. We cordially thank all of them for their help and for their interest 
in our work. A major point of confusion has been revealed by Robert McLachlan in 
his book review in SIAM Reviews. 

Besides many details, which have improved the presentation throughout the 
book, there are the following major additions and changes which make the book 
about 130 pages longer: 

- a more prominent place of the Stormer-Verlet method in the exposition and the 
examples of the first chapter; 

- a discussion of the Henon-Heiles model as an example of a chaotic Hamiltonian 
system; 

- a new Sect. IV .9 on geometric numerical linear algebra considering differential 
equations on Stiefel and Grassmann manifolds and dynamical low-rank approxi¬ 
mations; 

- a new improved composition method of order 10 in Sect. V. 3 ; 

- a characterization of B-series methods that conserve quadratic first integrals and 
a criterion for conjugate symplecticity in Sect. VI.8; 

- the section on volume preservation taken from Chap. VII to Chap. VI; 

- an extended and more coherent Chap. VII, renamed Non-Canonical Hamiltonian 
Systems, with more emphasis on the relationships between Hamiltonian systems 
on manifolds and Poisson systems; 

- a completely reorganized and augmented Sect. VII .5 on the rigid body dynamics 
and Lie-Poisson systems; 

- a new Sect. VII.6 on reduced Hamiltonian models of quantum dynamics and Pois¬ 
son integrators for their numerical treatment; 

- an improved step-size control for reversible methods in Sects. VIII. 3.2 and IX.6; 

- extension of Sect. IX .5 on modified equations of methods on manifolds to include 
constrained Hamiltonian systems and Lie-Poisson integrators; 

- reorganization of Sects. IX .9 and IX. 10 ; study of non-symplectic B-series meth¬ 
ods that have a modified Hamiltonian, and counter-examples for symmetric meth¬ 
ods showing linear growth in the energy error; 



Preface to the Second Edition 


viii 


- a more precise discussion of integrable reversible systems with new examples in 
Chap. XI; 

- extension of Chap. XIII on highly oscillatory problems to systems with several 
constant frequencies and to systems with non-constant mass matrix; 

- a new Chap. XIV on oscillatory Hamiltonian systems with time- or solution- 
dependent high frequencies, emphasizing adiabatic transformations, adiabatic in¬ 
variants, and adiabatic integrators; 

- a completely rewritten Chap. XV with more emphasis on linear multistep meth¬ 
ods for second order differential equations; a complete backward error analysis 
including parasitic modified differential equations; a study of the long-time sta¬ 
bility and a rigorous explanation of the long-time near-conservation of energy and 
angular momentum. 

Let us hope that this second revised edition will again meet good acceptance by our 

readers. 

Geneva and Tubingen, October 2005 The Authors 



Table of Contents 


I. Examples and Numerical Experiments. 1 

1.1 First Problems and Methods. 1 

1.1.1 The Lotka-Volterra Model. 1 

1.1.2 First Numerical Methods. 3 

1.1.3 The Pendulum as a Hamiltonian System. 4 

1.1.4 The Stormer-Verlet Scheme. 7 

1.2 The Kepler Problem and the Outer Solar System. 8 

1.2.1 Angular Momentum and Kepler’s Second Law. 9 

1.2.2 Exact Integration of the Kepler Problem. 10 

1.2.3 Numerical Integration of the Kepler Problem. 12 

1.2.4 The Outer Solar System. 13 

1.3 The Henon-Heiles Model. 15 

1.4 Molecular Dynamics. 18 

1.5 Highly Oscillatory Problems. 21 

1.5.1 A Fermi-Pasta-Ulam Problem. 21 

1.5.2 Application of Classical Integrators. 23 

1.6 Exercises. 24 

II. Numerical Integrators. 27 

II. 1 Runge-Kutta and Collocation Methods . 27 

II. 1.1 Runge-Kutta Methods. 28 

II. 1.2 Collocation Methods. 30 

II. 1.3 Gauss and Lobatto Collocation. 34 

II. 1.4 Discontinuous Collocation Methods. 35 

11.2 Partitioned Runge-Kutta Methods. 38 

11.2.1 Definition and First Examples . 38 

11.2.2 Lobatto IIIA-IIIB Pairs. 40 

11.2.3 Nystrom Methods . 41 

11.3 The Adjoint of a Method . 42 

11.4 Composition Methods. 43 

11.5 Splitting Methods . 47 

11.6 Exercises. 50 


































X 


Table of Contents 


III. Order Conditions, Trees and B-Series. 51 

III. 1 Runge-Kutta Order Conditions and B-Series. 51 

III. 1.1 Derivation of the Order Conditions. 51 

III. 1.2 B-Series. 56 

III. 1.3 Composition of Methods. 59 

III.1.4 Composition of B-Series. 61 

III. 1.5 The Butcher Group. 64 

111.2 Order Conditions for Partitioned Runge-Kutta Methods. 66 

111.2.1 Bi-Coloured Trees and P-Series. 66 

111.2.2 Order Conditions for Partitioned Runge-Kutta Methods 68 

111.2.3 Order Conditions for Nystrom Methods. 69 

111.3 Order Conditions for Composition Methods . 71 

111.3.1 Introduction. 71 

111.3.2 The General Case. 73 

111.3.3 Reduction of the Order Conditions . 75 

111.3.4 Order Conditions for Splitting Methods. 80 

111.4 The Baker-Campbell-Hausdorff Formula. 83 

111.4.1 Derivative of the Exponential and Its Inverse. 83 

111.4.2 The BCH Formula. 84 

111.5 Order Conditions via the BCH Formula. 87 

111.5.1 Calculus of Lie Derivatives. 87 

111.5.2 Lie Brackets and Commutativity. 89 

111.5.3 Splitting Methods. 91 

111.5.4 Composition Methods. 92 

111.6 Exercises. 95 

IV. Conservation of First Integrals and Methods on Manifolds. 97 

IV. 1 Examples of First Integrals. 97 

IV. 2 Quadratic Invariants .101 

IV.2.1 Runge-Kutta Methods.101 

IV.2.2 Partitioned Runge-Kutta Methods.102 

IV.2.3 Ny strom Methods .104 

IV. 3 Polynomial Invariants.105 

IV.3.1 The Determinant as a First Integral.105 

IV.3.2 Isospectral Flows.107 

IV.4 Projection Methods.109 

IV.5 Numerical Methods Based on Local Coordinates.113 

IV.5.1 Manifolds and the Tangent Space.114 

IV.5.2 Differential Equations on Manifolds.115 

IV.5.3 Numerical Integrators on Manifolds.116 

IV.6 Differential Equations on Lie Groups.118 

IV.7 Methods Based on the Magnus Series Expansion.121 

IV. 8 Lie Group Methods.123 

IV.8.1 Crouch-Grossman Methods.124 

IV.8.2 Munthe-Kaas Methods .125 














































Table of Contents 


xi 


IV.8.3 Further Coordinate Mappings.128 

IV. 9 Geometric Numerical Integration Meets Geometric Numerical 

Linear Algebra.131 

IV.9.1 Numerical Integration on the Stiefel Manifold.131 

IV.9.2 Differential Equations on the Grassmann Manifold .... 135 

IV. 9.3 Dynamical Low-Rank Approximation.137 

IV. 10 Exercises.139 

V. Symmetric Integration and Reversibility.143 

V. l Reversible Differential Equations and Maps .143 

V.2 Symmetric Runge-Kutta Methods.146 

V. 2.1 Collocation and Runge-Kutta Methods.146 

V.2.2 Partitioned Runge-Kutta Methods.148 

V.3 Symmetric Composition Methods.149 

V.3.1 Symmetric Composition of First Order Methods.150 

V.3.2 Symmetric Composition of Symmetric Methods.154 

V.3.3 Effective Order and Processing Methods .158 

V.4 Symmetric Methods on Manifolds.161 

V.4.1 Symmetric Projection.161 

V. 4.2 Symmetric Methods Based on Local Coordinates.166 

V.5 Energy - Momentum Methods and Discrete Gradients.171 

V. 6 Exercises.176 

VI. Symplectic Integration of Hamiltonian Systems.179 

VI. 1 Hamiltonian Systems .180 

VI. 1.1 Lagrange’s Equations .180 

VI. 1.2 Hamilton’s Canonical Equations.181 

VI.2 Symplectic Transformations.182 

VI.3 First Examples of Symplectic Integrators.187 

VI.4 Symplectic Runge-Kutta Methods .191 

VI.4.1 Criterion of Symplecticity.191 

VI.4.2 Connection Between Symplectic and Symmetric 

Methods .194 

VIA Generating Functions .195 

VI.5.1 Existence of Generating Functions .195 

VI.5.2 Generating Function for Symplectic Runge-Kutta 

Methods .198 

VIA.3 The Hamilton-Jacobi Partial Differential Equation .... 200 

VI.5.4 Methods Based on Generating Functions.203 

VI.6 Variational Integrators.204 

VI.6.1 Hamilton’s Principle.204 

VI.6.2 Discretization of Hamilton’s Principle.206 

VIA.3 Symplectic Partitioned Runge-Kutta Methods 

Revisited.208 

VI.6.4 Noether’s Theorem.210 








































Table of Contents 


xii 


VI.7 Characterization of Symplectic Methods .212 

VI.7.1 B-Series Methods Conserving Quadratic First Integrals 212 

VI.7.2 Characterization of Symplectic P-Series (and B-Series) 217 

VI.7.3 Irreducible Runge-Kutta Methods.220 

VI.7.4 Characterization of Irreducible Symplectic Methods ... 222 

VI.8 Conjugate Symplecticity.222 

VI.8.1 Examples and Order Conditions.223 

VI. 8.2 Near Conservation of Quadratic First Integrals .225 

VI.9 Volume Preservation.227 

VI. 10 Exercises.233 

VII. Non-Canonical Hamiltonian Systems.237 

VII. 1 Constrained Mechanical Systems .237 

VII. 1.1 Introduction and Examples.237 

VII. 1.2 Hamiltonian Formulation.239 

VII. 1.3 A Symplectic First Order Method.242 

VII. 1.4 SHAKE and RATTLE.245 

VII. 1.5 The Lobatto IIIA - IIIB Pair.247 

VII. 1.6 Splitting Methods.252 

VII.2 Poisson Systems .254 

VII.2.1 Canonical Poisson Structure.254 

VII.2.2 General Poisson Structures.256 

VII.2.3 Hamiltonian Systems on Symplectic Submanifolds .... 258 

VII.3 The Darboux-Lie Theorem.261 

VII.3.1 Commutativity of Poisson Flows and Lie Brackets .... 261 
VII.3.2 Simultaneous Linear Partial Differential Equations .... 262 
VII.3.3 Coordinate Changes and the Darboux-Lie Theorem ... 265 

VII.4 Poisson Integrators .268 

VII.4.1 Poisson Maps and Symplectic Maps.268 

VII.4.2 Poisson Integrators.270 

VII.4.3 Integrators Based on the Darboux-Lie Theorem.272 

VII.5 Rigid Body Dynamics and Lie-Poisson Systems.274 

VII.5.1 History of the Euler Equations.275 

VII.5.2 Hamiltonian Formulation of Rigid Body Motion.278 

VII.5.3 Rigid Body Integrators.280 

VII.5.4 Lie-Poisson Systems.286 

VII.5.5 Lie-Poisson Reduction.289 

VII.6 Reduced Models of Quantum Dynamics.293 

VII.6.1 Hamiltonian Structure of the Schrodinger Equation . .. 293 

VII.6.2 The Dirac-Frenkel Variational Principle.295 

VII.6.3 Gaussian Wavepacket Dynamics.296 

VII.6.4 A Splitting Integrator for Gaussian Wavepackets.298 

VII.7 Exercises.301 






































Table of Contents 


xiii 

VIII. Structure-Preserving Implementation.303 

VIII. 1 Dangers of Using Standard Step Size Control.303 

VIII.2 Time Transformations.306 

VIII.2.1 Symplectic Integration .306 

VIII.2.2 Reversible Integration.309 

VIII.3 Structure-Preserving Step Size Control.310 

VIII.3.1 Proportional, Reversible Controllers.310 

VIII.3.2 Integrating, Reversible Controllers .314 

VIII.4 Multiple Time Stepping .316 

VIII.4.1 Fast-Slow Splitting: the Impulse Method.317 

VIII.4.2 Averaged Forces.319 

VIII.5 Reducing Rounding Errors.322 

VIII.6 Implementation of Implicit Methods.325 

VIII.6.1 Starting Approximations.326 

VIII. 6.2 Fixed-Point Versus Newton Iteration.330 

VIII. 7 Exercises.335 

IX. Backward Error Analysis and Structure Preservation.337 

IX. 1 Modified Differential Equation - Examples.337 

IX.2 Modified Equations of Symmetric Methods.342 

IX.3 Modified Equations of Symplectic Methods.343 

IX. 3.1 Existence of a Local Modified Hamiltonian.343 

IX.3.2 Existence of a Global Modified Hamiltonian.344 

IX.3.3 Poisson Integrators.347 

IX.4 Modified Equations of Splitting Methods.348 

IX.5 Modified Equations of Methods on Manifolds.350 

IX.5.1 Methods on Manifolds and First Integrals.350 

IX.5.2 Constrained Hamiltonian Systems.352 

IX.5.3 Lie-Poisson Integrators.354 

IX.6 Modified Equations for Variable Step Sizes.356 

IX.7 Rigorous Estimates - Local Error.358 

IX.7.1 Estimation of the Derivatives of the Numerical Solution 360 
IX.7.2 Estimation of the Coefficients of the Modified Equation 362 
IX.7.3 Choice of N and the Estimation of the Local Error .... 364 

IX.8 Long-Time Energy Conservation.366 

IX.9 Modified Equation in Terms of Trees.369 

IX.9.1 B-Series of the Modified Equation.369 

IX.9.2 Elementary Hamiltonians.373 

IX.9.3 Modified Hamiltonian.375 

IX.9.4 First Integrals Close to the Hamiltonian.375 

IX.9.5 Energy Conservation: Examples and Counter-Examples 379 

IX. 10 Extension to Partitioned Systems.381 

IX. 10.1 P-Series of the Modified Equation.381 

IX. 10.2 Elementary Hamiltonians.384 

IX. 11 Exercises.386 











































XIV 


Table of Contents 


X. Hamiltonian Perturbation Theory and Symplectic Integrators.389 

X.l Completely Integrable Hamiltonian Systems.390 

X.1.1 Local Integration by Quadrature .390 

X.1.2 Completely Integrable Systems.393 

X.1.3 Action-Angle Variables.397 

X.1.4 Conditionally Periodic Flows.399 

X.1.5 The Toda Lattice - an Integrable System.402 

X.2 Transformations in the Perturbation Theory for Integrable 

Systems.404 

X.2.1 The Basic Scheme of Classical Perturbation Theory ... 405 

X.2.2 Lindstedt-Poincare Series.406 

X.2.3 Kolmogorov’s Iteration.410 

X.2.4 Birkhoff Normalization Near an Invariant Torus.412 

X.3 Linear Error Growth and Near-Preservation of First Integrals ... 413 

X.4 Near-Invariant Tori on Exponentially Long Times.417 

X.4.1 Estimates of Perturbation Series.417 

X.4.2 Near-Invariant Tori of Perturbed Integrable Systems ... 421 

X.4.3 Near-Invariant Tori of Symplectic Integrators .422 

X.5 Kolmogorov’s Theorem on Invariant Tori.423 

X.5.1 Kolmogorov’s Theorem.423 

X.5.2 KAM Tori under Symplectic Discretization.428 

X.6 Invariant Tori of Symplectic Maps.430 

X.6.1 A KAM Theorem for Symplectic Near-Identity Maps . 431 

X.6.2 Invariant Tori of Symplectic Integrators.433 

X. 6.3 Strongly Non-Resonant Step Sizes .433 

X. 7 Exercises.434 

XI. Reversible Perturbation Theory and Symmetric Integrators.437 

XI. 1 Integrable Reversible Systems.437 

XI.2 Transformations in Reversible Perturbation Theory .442 

XI. 2.1 The Basic Scheme of Reversible Perturbation Theory.. 443 

XI.2.2 Reversible Perturbation Series.444 

XI.2.3 Reversible KAM Theory.445 

XI.2.4 Reversible Birkhoff-Type Normalization .447 

XI.3 Linear Error Growth and Near-Preservation of First Integrals ... 448 

XI.4 Invariant Tori under Reversible Discretization.451 

XI.4.1 Near-Invariant Tori over Exponentially Long Times ... 451 

XI. 4.2 A KAM Theorem for Reversible Near-Identity Maps .. 451 

XI. 5 Exercises.453 

XII. Dissipatively Perturbed Hamiltonian and Reversible Systems.455 

XII. 1 Numerical Experiments with Van der Pol’s Equation.455 

XII.2 Averaging Transformations .458 

XII. 2.1 The Basic Scheme of Averaging .458 

XII.2.2 Perturbation Series.459 





































Table of Contents 


xv 


XII.3 Attractive Invariant Manifolds.460 

XII.4 Weakly Attractive Invariant Tori of Perturbed Integrable Systems 464 

XII.5 Weakly Attractive Invariant Tori of Numerical Integrators.465 

XII.5.1 Modified Equations of Perturbed Differential Equations 466 
XII.5.2 Symplectic Methods.467 

XII. 5.3 Symmetric Methods.469 

XII. 6 Exercises.469 

XIII. Oscillatory Differential Equations with Constant High Frequencies . 471 

XIII. 1 Towards Longer Time Steps in Solving Oscillatory Equations 

of Motion.471 

XIII. 1.1 The Stormer-Verlet Method vs. Multiple Time Scales . 472 
XIII. 1.2 Gautschi’s and Deuflhard’s Trigonometric Methods .. . 473 

XIII. 1.3 The Impulse Method.475 

XIII. 1.4 The Mollified Impulse Method.476 

XIII. 1.5 Gautschi’s Method Revisited.477 

XIII. 1.6 Two-Force Methods.478 

XIII.2 A Nonlinear Model Problem and Numerical Phenomena.478 

XIII.2.1 Time Scales in the Fermi-Pasta-Ulam Problem.479 

XIII.2.2 Numerical Methods.481 

XIII.2.3 Accuracy Comparisons.482 

XIII.2.4 Energy Exchange between Stiff Components.483 

XIII.2.5 Near-Conservation of Total and Oscillatory Energy.... 484 

XIII.3 Principal Terms of the Modulated Fourier Expansion.486 

XIII.3.1 Decomposition of the Exact Solution .486 

XIII.3.2 Decomposition of the Numerical Solution.488 

XIII.4 Accuracy and Slow Exchange.490 

XIII.4.1 Convergence Properties on Bounded Time Intervals ... 490 
XIII.4.2 Intra-Oscillatory and Oscillatory-Smooth Exchanges .. 494 

XIII.5 Modulated Fourier Expansions .496 

XIII.5.1 Expansion of the Exact Solution .496 

XIII.5.2 Expansion of the Numerical Solution.498 

XIII.5.3 Expansion of the Velocity Approximation.502 

XIII.6 Almost-Invariants of the Modulated Fourier Expansions.503 

XIII.6.1 The Hamiltonian of the Modulated Fourier Expansion . 503 
XIII.6.2 A Formal Invariant Close to the Oscillatory Energy ... 505 

XIII.6.3 Almost-Invariants of the Numerical Method.507 

XIII.7 Long-Time Near-Conservation of Total and Oscillatory Energy . 510 

XIII.8 Energy Behaviour of the Stormer-Verlet Method.513 

XIII.9 Systems with Several Constant Frequencies.516 

XIII.9.1 Oscillatory Energies and Resonances .517 

XIII.9.2 Multi-Frequency Modulated Fourier Expansions.519 

XIII.9.3 Almost-Invariants of the Modulation System.521 

XIII.9.4 Long-Time Near-Conservation of Total and 

Oscillatory Energies .524 


































XVI 


Table of Contents 


XIII. 10 Systems with Non-Constant Mass Matrix.526 

XIII. 11 Exercises.529 

XIV. Oscillatory Differential Equations with Varying High Frequencies. . 531 

XIV. 1 Linear Systems with Time-Dependent Skew-Hermitian Matrix .. 531 

XIV. 1.1 Adiabatic Transformation and Adiabatic Invariants .... 531 

XIV. 1.2 Adiabatic Integrators.536 

XIV.2 Mechanical Systems with Time-Dependent Frequencies.539 

XIV.2.1 Canonical Transformation to Adiabatic Variables.540 

XIV.2.2 Adiabatic Integrators.547 

XIV.2.3 Error Analysis of the Impulse Method.550 

XIV.2.4 Error Analysis of the Mollified Impulse Method.554 

XIV.3 Mechanical Systems with Solution-Dependent Frequencies.555 

XIV.3.1 Constraining Potentials.555 

XIV.3.2 Transformation to Adiabatic Variables .558 

XIV.3.3 Integrators in Adiabatic Variables.563 

XIV. 3.4 Analysis of Multiple Time-Stepping Methods.564 

XIV. 4 Exercises.564 

XV. Dynamics of Multistep Methods .567 

XV. 1 Numerical Methods and Experiments.567 

XV. 1.1 Linear Multistep Methods.567 

XV. 1.2 Multistep Methods for Second Order Equations.569 

XV. 1.3 Partitioned Multistep Methods.572 

XV.2 The Underlying One-Step Method.573 

XV.2.1 Strictly Stable Multistep methods .573 

XV.2.2 Formal Analysis for Weakly Stable Methods.575 

XV.3 Backward Error Analysis.576 

XV.3.1 Modified Equation for Smooth Numerical Solutions . .. 576 

XV.3.2 Parasitic Modified Equations.579 

XV.4 Can Multistep Methods be Symplectic?.585 

XV.4.1 Non-Symplecticity of the Underlying One-Step Method 585 
XV.4.2 Symplecticity in the Higher-Dimensional Phase Space . 587 

XV.4.3 Modified Hamiltonian of Multistep Methods.589 

XV.4.4 Modified Quadratic First Integrals.591 

XV.5 Long-Term Stability.592 

XV.5.1 Role of Growth Parameters.592 

XV.5.2 Hamiltonian of the Full Modified System.594 

XV.5.3 Long-Time Bounds for Parasitic Solution Components 596 

XV.6 Explanation of the Long-Time Behaviour.600 

XV.6.1 Conservation of Energy and Angular Momentum.600 

XV.6.2 Linear Error Growth for Integrable Systems.601 

XV.7 Practical Considerations.602 

XV.7.1 Numerical Instabilities and Resonances.602 

XV.7.2 Extension to Variable Step Sizes.605 







































Table of Contents xvii 


XV. 8 Multi-Value or General Linear Methods.609 

XV.8.1 Underlying One-Step Method and Backward Error 

Analysis .609 

XV.8.2 Symplecticity and Symmetry.611 

XV.8.3 Growth Parameters .614 

XV. 9 Exercises.615 

Bibliography.617 

Index.637 










Chapter I. 

Examples and Numerical Experiments 


This chapter introduces some interesting examples of differential equations and il¬ 
lustrates different types of qualitative behaviour of numerical methods. We deliber¬ 
ately consider only very simple numerical methods of orders 1 and 2 to emphasize 
the qualitative aspects of the experiments. The same effects (on a different scale) 
occur with more sophisticated higher-order integration schemes. The experiments 
presented here should serve as a motivation for the theoretical and practical inves¬ 
tigations of later chapters. The reader is encouraged to repeat the experiments or to 
invent similar ones. 


1.1 First Problems and Methods 

Numerical applications of the case of two dependent variables are not 
easily obtained. (A.J. Lotka 1925, p. 79) 

Our first problems, the Lotka-Volterra model and the pendulum equation, are dif¬ 
ferential equations in two dimensions and show already many interesting geometric 
properties. Our first methods are various variants of the Euler method, the midpoint 
rule, and the Stormer-Verlet scheme. 

1.1.1 The Lotka-Volterra Model 

We start with an equation from mathematical biology which models the growth of 
animal species. If a real variable u(t) is to represent the number of individuals of a 
certain species at time t, the simplest assumption about its evolution is du/dt = u-a, 
where a is the reproduction rate. A constant a leads to exponential growth. In the 
case of more species living together, the reproduction rates will also depend on 
the population numbers of the other species. For example, for two species with 
u(t) denoting the number of predators and v(t) the number of prey, a plausible 
assumption is made by the Lotka-Volterra model 

u = u(v — 2 ) 

n (L1) 

V = v(l — U), 

where the dots on u and v stand for differentiation with respect to time. (We have 
chosen the constants 2 and 1 in (1.1) arbitrarily.) A.J. Lotka (1925, Chap. VIII) used 
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Fig. 1.1. Vector field, exact flow, and numerical flow for the Lotka-Volterra model (1.1) 


this model to study parasitic invasion of insect species, and, with its help, V. Volterra 
(1927) explained curious fishing data from the upper Adriatic Sea following World 
War I. 

Equations (1.1) constitute an autonomous system of differential equations. In 
general, we write such a system in the form 

v = f(y) ■ (1.2) 

Every y represents a point in the phase space , in equation (1.1) above y = (u,v) 
is in the phase plane M 2 . The vector-valued function f(y) represents a vector field 
which, at any point of the phase space, prescribes the velocity (direction and speed) 
of the solution y(t) that passes through that point (see the first picture of Fig. 1.1). 

For the Lotka-Volterra model, we observe that the system cycles through three 
stages: (1) the prey population increases; (2) the predator population increases by 
feeding on the prey; (3) the predator population diminishes due to lack of food. 

Flow of the System. A fundamental concept is the flow over time t. This is the 
mapping which, to any point yo in the phase space, associates the value y(t) of the 
solution with initial value y{ 0) = yo. This map, denoted by ip t , is thus defined by 

<Pt(yo) = y(t) if 2/(0) = yo- (1.3) 


The second picture of Fig. 1.1 shows the results of three iterations of ip t (with t = 
1.3) for the Lotka-Volterra problem, for a set of initial values yo = (uo, vo) forming 
an animal-shaped set A. 1 

Invariants. If we divide the two equations of (1.1) by each other, we obtain a single 
equation between the variables u and v. After separation of variables we get 


^ 1 — it. v — 2 . d T/ . 

0 = - u - v = — l{u,v) 

u v dt 


1 This cat came to fame through Arnold (1963). 
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where 


I(u,v ) = In u — u + 2 In v — v , 


(1.4) 


so that I(u(t),v(t )) = Const for all t. We call the function I an invariant of the 
system (1.1). Every solution of (1.1) thus lies on a level curve of (1.4). Some of 
these curves are drawn in the pictures of Fig. 1.1. Since the level curves are closed, 
all solutions of ( 1 . 1 ) are periodic. 


1.1.2 First Numerical Methods 

Explicit Euler Method. The simplest of all numerical methods for the system (1.2) 
is the method formulated by Euler (1768), 

Vn+l = Un H - hfil/n)' (1-5) 

It uses a constant step size h to compute, one after the other, approximations yi,y 2 , 
2 / 3 , ... to the values y(h), y(2h ), y(3h ), ... of the solution starting from a given 
initial value y{ 0) = yo . The method is called the explicit Euler method , because 
the approximation y n +i is computed using an explicit evaluation of / at the already 
known value y n . Such a formula represents a mapping 

• yn 1 * hn+It 

which we call the discrete or numerical flow. Some iterations of the discrete flow for 
the Lotka-Volterra problem (1.1) (with h = 0.5) are represented in the third picture 
of Fig. 1.1. 

Implicit Euler Method. The implicit Euler method 

Vn+1 =Vn + hf(y n+ 1 ), (1.6) 

is known for its all-damping stability properties. In contrast to (1.5), the approx¬ 
imation T/n+i is defined implicitly by ( 1 . 6 ), and the implementation requires the 
numerical solution of a nonlinear system of equations. 

Implicit Midpoint Rule. Taking the mean of y n and y n + i in the argument of /, we 
get the implicit midpoint rule 


. 7 rfVn + y n +1 \ n 

y n +i=y n + hfy ---J. (1.7) 

It is a symmetric method, which means that the formula is left unaltered after ex¬ 
changing y n y n+ 1 and h —h (more on symmetric methods in Chap. V). 

Symplectic Euler Methods. For partitioned systems 

u = a(u,v) 


v 


( 1 . 8 ) 
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Fig. 1.2. Solutions of the Lotka-Volterra equations (1.1) (step sizes h = 0.12; initial values 
(2, 2) for the explicit Euler method, (4, 8) for the implicit Euler method, (4,2) and (6, 2) for 
the symplectic Euler method) 


such as the problem (1.1), we consider also partitioned Euler methods 

^n+l — + hQj(Um V n - |-l) ^n+1 — hd{u n -\-\ , U n ) 

or (1.9) 

^n+l = V n hb(u n , ^ 77 ,- 1 - 1 ) , ^n+1 = V n + hb(ll n j^^ , , 

which treat one variable by the implicit and the other variable by the explicit Euler 
method. In view of an important property of this method, discovered by de Vogelaere 
(1956) and to be discussed in Chap. VI, we call them symplectic Euler methods. 

Numerical Example for the Lotka-Volterra Problem. Our first numerical exper¬ 
iment shows the behaviour of the various numerical methods applied to the Lotka- 
Volterra problem. In particular, we are interested in the preservation of the invariant 
I over long times. Fig. 1.2 plots the numerical approximations of the first 125 steps 
with the above numerical methods applied to (1.1), all with constant step sizes. We 
observe that the explicit and implicit Euler methods show wrong qualitative be¬ 
haviour. The numerical solution either spirals outwards or inwards. The symplectic 
Euler method (implicit in u and explicit in v ), however, gives a numerical solution 
that lies apparently on a closed curve as does the exact solution. Note that the curves 
of the numerical and exact solutions do not coincide. 

1.1.3 The Pendulum as a Hamiltonian System 

A great deal of attention in this book will be addressed to Hamiltonian problems, 
and our next examples will be of this type. These problems are of the form 


P= -H q (p,q), q = H p (p,q), (1.10) 

where the Hamiltonian H(pi,... ,pd, qi ,... qa) represents the total energy; qi are 
the position coordinates and pi the momenta for i = 1 ,..., d, with d the number of 
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degrees of freedom; H p and H q are the vectors of partial derivatives. One verifies 
easily by differentiation (see Sect. IV. 1) that, along the solution curves of (1.10), 

H(p(t), q(t)) = Const, (1.11) 

i.e., the Hamiltonian is an invariant or a first integral. More details about Hamil¬ 
tonian systems and their derivation from Lagrangian mechanics will be given in 
Sect. VI. 1. 

Pendulum. The mathematical pendulum (mass m = 1, 
massless rod of length 1 = 1, gravitational acceleration 
g = 1) is a system with one degree of freedom having the yZZZZZZZZZZ/, 


Hamiltonian 




H{p,q) = \p 2 ~ cos <7, 



A 

(1.12) 

cos q 

\ 

so that the equations of motion (1.10) become 





p= — sing, q = p. (1.13) 


Since the vector field (1.13) is 27r-periodic in q , it is natural to consider q as a vari¬ 
able on the circle S' 1 . Hence, the phase space of points (p, q) becomes the cylinder 
M x S' 1 . Fig. 1.3 shows some level curves of H(p , q). By (1.11), the solution curves 
of the problem (1.13) lie on such level curves. 



exact flow explicit Euler symplectic Euler 


Fig. 1.3. Exact and numerical flow for the pendulum problem (1.13); step sizes h = t = 1 

Area Preservation. Figure 1.3 (first picture) illustrates that the exact flow of a 
Hamiltonian system (1.10) is area preserving. This can be explained as follows: the 
derivative of the flow Lp t with respect to initial values (p, q), 

dip, q) 


Ptip^q) 
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Fig. 1.5. Carl Stormer (left picture), born: 3 September 1874 in Skien (Norway), died: 13 Au¬ 
gust 1957. 

Loup Verlet (right picture), born: 24 May 1931 in Paris 


1.1.4 The Stormer-Verlet Scheme 


The above equations (1.13) for the pendulum are of the form 


P = /(<?) 

q=p 


or q = f(q) 


(1.14) 


which is the important special case of a second order differential equation. The most 
natural discretization of (1.14) is 

q n +i ~ 2<7„ + q n -% = h 2 f(q n ), (1.15) 


which is just obtained by replacing the second derivative in (1.14) by the central 
second-order difference quotient. This basic method, or its equivalent formulation 
given below, is called the Stormer method in astronomy, the Verlet method 3 in mole¬ 
cular dynamics, the leap-frog method in the context of partial differential equations, 
and it has further names in other areas (see Hairer, Lubich & Wanner (2003), p. 402). 
C. Stormer (1907) used higher-order variants for numerical computations concern¬ 
ing the aurora borealis. L. Verlet (1967) proposed this method for computations in 
molecular dynamics, where it has become by far the most widely used integration 
scheme. 

Geometrically, the Stormer-Verlet method can be seen as produced by parabo¬ 
las, which in the points t n possess the right second derivative f(q n ) (see Fig. 1.6 

3 Irony of fate: Professor Loup Verlet, who later became interested in the history of science, 
discovered precisely “his” method in Newton’s Principia (Book I, figure for Theorem I, 
see Sect. 1.2.1 below). 
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Fig. 1.6. Illustration for the Stormer-Verlet method 


to the left). But we can also think of polygons, which possess the right slope in the 
midpoints (Fig. 1.6 to the right). 

Approximations to the derivative p = q are simply obtained by 

Pn = -^- and p n+ 1/2 = -^-. (1-16) 

One-Step Formulation. The Stormer-Verlet method admits a one-step formulation 
which is useful for actual computations. The value q n together with the slope p n and 
the second derivative f(q n ), all at t n , uniquely determine the parabola and hence 
also the approximation ( : p n +i,q n +i ) atf, l+1 . Writing (1.15) as p n+ 1/ 2 ~p n - 1/2 = 
hf(q n ) and using p n+1/2 +p n - 1/2 = 2p„ , we get by elimination of either p n+1/2 
or Pn—1/2 the formulae 


Pn+ 1/2 

= Pn+\ /(<?n) 


Qn+1 

= q n + hp n+ i/ 2 

(1.17) 

Pn+1 

= Pn+l/2 + - f{qn+l) 



which is an explicit one-step method ^ : (q n: p n ) ► (^n+i 5 Pn+i) for the corre¬ 
sponding first order system of (1.14). If one is not interested in the values p n of the 
derivative, the first and third equations in (1.17) can be replaced by 

Pn+ 1/2 = Pn 1/2 f /'/('/»)• 


1.2 The Kepler Problem and the Outer Solar System 

I awoke as if from sleep, a new light broke on me. (J. Kepler; quoted 
from J.L.E. Dreyer, A history of astronomy, 1906, Dover 1953, p. 391) 

One of the great achievements in the history of science was the discovery of the 
laws of J. Kepler (1609), based on many precise measurements of the positions of 
Mars by Tycho Brahe and himself. The planets move in elliptic orbits with the sun 
at one of the foci (Kepler’s first law) 
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d 

1 + e cos ip 


a — ae cos E , 


( 2 . 1 ) 


(where a = great axis, e = eccentricity, b = 
as/l — e 2 , d = fr\/l — e 2 = a(l — e 2 ), E = ec¬ 
centric anomaly, ip = true anomaly). 

Newton (Principia 1687) then explained this 
motion by his general law of gravitational attrac¬ 
tion (proportional to 1/r 2 ) and the relation between 
forces and acceleration (the “Lex II” of the Prin¬ 
cipia). This then opened the way for treating arbi¬ 
trary celestial motions by solving differential equa¬ 
tions. 



Two-Body Problem. For computing the motion of two bodies which attract each 
other, we choose one of the bodies as the centre of our coordinate system; the motion 
will then stay in a plane (Exercise 3) and we can use two-dimensional coordinates 
q = (41,42) for the position of the second body. Newton’s laws, with a suitable 
normalization, then yield the following differential equations 

qi (<Z?+4 2 2 ) 3 / 2 ’ Q2 (ql+ql) 3 ' 2 ' { } 

This is equivalent to a Hamiltonian system with the Hamiltonian 


H(Pi,P2,qi,q2) = ^ (Pi + pI) 


vw 


-ql 


Pi = qt- 


(2.3) 


1.2.1 Angular Momentum and Kepler’s Second Law 

The system has not only the total energy H (p, q) as a first integral, but also the 
angular momentum 

L(pi,P2,qi,q2) = qiP2 - q2Pi- (2.4) 

This can be checked by differentiation and is nothing other than Kepler's second 
law , which says that the ray EM sweeps equal areas in equal times (see the little 
picture at the beginning of Sect. 1.2). 

A beautiful geometric justification of this law is due to I. Newton 4 (Principia 
(1687), Book I, figure for Theorem I). The idea is to apply the Stormer-Verlet 
scheme (1.15) to the equations (2.2) (see Fig. 2.1). By hypothesis, the diago¬ 
nal of the parallelogram g n -i4n4n+i, which is (q n+1 - q n ) - (q n - q n - 1 ) = 
q n+ 1 — 2 q n + q n -i = Const • f(q n ), points towards the sun S. Therefore, the 
altitudes of the triangles g n -i4n^ and q n +i(lnS are equal. Since they have the com¬ 
mon base q n S , they also have equal areas. Hence 

det(g n _i, q n - q n - 1 ) = det (q n , q n +i - q n ) 
and by passing to the limit h —> 0 we see that det(g,p) = Const. This is (2.4). 

4 We are grateful to a private communication of L. Verlet for this reference 
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Fig. 2.1. Proof of Kepler’s Second Law (left); facsimile from Newton’s Principia (right) 


We have not only an elegant proof for this invariant, but we also see that the 
Stormer-Verlet scheme preserves this invariant for every h > 0. 


1.2.2 Exact Integration of the Kepler Problem 

Pour voir presentement que cette courbe ABC ... est toujours une Sec¬ 
tion Conique, ainsi que Mr. Newton l’a suppose, pag. 55. Coroll.I. sans le 
demontrer; il y faut bien plus d’adresse: (Joh. Bernoulli 1710, p. 475) 


It is now interesting, inversely to the procedure of Newton, to prove that any solution 
of (2.2) follows either an elliptic, parabolic or hyperbolic arc and to describe the 
solutions analytically. This was first done by Joh. Bernoulli (1710, full of sarcasm 
against Newton), and by Newton (1713, second edition of the Principia , without 
mentioning a word about Bernoulli). 

By (2.3) and (2.4), every solution of (2.2) satisfies the two relations 




= H 0 , qiq 2 ~ q<iqi = L 0 , 


(2.5) 


where the constants Hq and L 0 are determined by the initial values. Using polar 
coordinates q\—r cos p,q 2 —r sin p, this system becomes 


1 

2 


(r 2 + r 2 p 2 ) 



r 2 p = L 0 . 


( 2 . 6 ) 


For its solution we consider r as a function of p and write r = The elimina¬ 

tion of (p in (2.6) then yields 


1 

2 




In this equation we use the substitution r = 1 /u, dr = —du/u 2 , which gives (with 
' = d/dp) 


1 

2 


(u' 2 +u 2 ) 


Ll 


Ll °' 


This is a “Hamiltonian” for the system 


(2.7) 
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u + u = — i.e., u = — + ci cos p + C 2 sin p = 

where d = Lq and the constant e becomes, from (2.7), 


1 + e cos (p — p*) 


( 2 . 8 ) 


e — 1 + 2HqL 0 


(2.9) 


(by Exercise 7, the expression 1+2 HqLq * s non-negative). This is precisely formula 
(2.1). The angle p* is determined by the initial values ro and po. Equation (2.1) 
represents an elliptic orbit with eccentricity e for Ho < 0 (see Fig. 2.2, dotted line), 
a parabola for H 0 = 0, and a hyperbola for H 0 > 0. 

Finally, we must determine the variables r and p as functions of t. With the 
relation (2.8) and r = 1/u, the second equation of (2.6) gives 


d 2 


(l + e cos (p — p*))‘ 


dp = Lq dt 


( 2 . 10 ) 


which, after an elementary, but not easy, integration, represents an implicit equation 
for p(t). 



L 


implicit midpoint 

4 000 steps 

h = 0.05 


^' 



4 000 steps 

symplectic Euler — 0.05 



Fig. 2.2. Numerical solutions of the Kepler problem (eccentricity e = 0.6; in dots: exact 
solution) 
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1.2.3 Numerical Integration of the Kepler Problem 

For the problem (2.2) we choose, with 0 < e < 1, the initial values 


9i(0) = 1 — e, <72 (0) = 0, <h(0) = 0, q 2 { 0) = ]j\^- (2.11) 

This implies that Hq = —1/2, L 0 = v/1 — e 2 , d = 1 — e 2 and p* = 0. The period 
of the solution is 2i r (Exercise 5). Fig. 2.2 shows some numerical solutions for the 
eccentricity e = 0.6 compared to the exact solution. After our previous experience, 
it is no longer a surprise that the explicit Euler method spirals outwards and gives a 
completely wrong answer. For the other methods we take a step size 100 times larger 
in order to “see something”. We see that the nonsymmetric symplectic Euler method 
distorts the ellipse, and that all methods exhibit a precession effect, clockwise for 
Stormer-Verlet and symplectic Euler, anti-clockwise for the implicit midpoint rule. 
The same behaviour occurs for the exact solution of perturbed Kepler problems 
(Exercise 12) and has occupied astronomers for centuries. 

Our next experiment (Fig. 2.3) studies the conservation of invariants and the 
global error. The main observation is that the error in the energy grows linearly for 
the explicit Euler method, and it remains bounded and small (no secular terms) for 
the symplectic Euler method. The global error, measured in the Euclidean norm, 
shows a quadratic growth for the explicit Euler compared to a linear growth for 
the symplectic Euler. As indicated in Table 2.1 the implicit midpoint rule and the 
Stormer-Verlet scheme behave similar to the symplectic Euler, but have a smaller 



Fig. 2.3. Energy conservation and global error for the Kepler problem 
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Table 2.1. Qualitative long-time behaviour for the Kepler problem; t is time, h the step size 


method 

error in H 

error in L 

global error 

explicit Euler 

<D(th) 

G(th) 

Q(t 2 h) 

symplectic Euler 

o(h ) 

0 

G(th ) 

implicit midpoint 

0{h 2 ) 

0 

Q(th 2 ) 

Stormer-Verlet 

o(h 2 ) 

0 

Q(th 2 ) 


error due to their higher order. We remark that the angular momentum L(p, q) is ex¬ 
actly conserved by the symplectic Euler, the Stormer-Verlet, and the implicit mid¬ 
point rule. 


1.2.4 The Outer Solar System 

The evolution of the entire planetary system has been numerically in¬ 
tegrated for a time span of nearly 100 million years 5 . This calculation 
confirms that the evolution of the solar system as a whole is chaotic, ... 

(G.J. Sussman & J. Wisdom 1992) 


We next apply our methods to the system which describes the motion of the five 
outer planets relative to the sun. This system has been studied extensively by as¬ 
tronomers. The problem is a Hamiltonian system (1.10) (TV-body problem) with 


H{p,q) 


1 ^ 1 5 i—1 

o 


2 m. 
2 = 0 


i= 1 j =o 


rrii rrij 

hi-QiW 


( 2 . 12 ) 


Here p and q are the supervectors composed by the vectors p t , q t € R 3 (momenta 
and positions), respectively. The chosen units are: masses relative to the sun, so that 
the sun has mass 1. We have taken 


m 0 = 1.00000597682 

to take account of the inner planets. Distances are in astronomical units (1 [A.U.] = 
149 597870 [km]), times in earth days, and the gravitational constant is 

G = 2.95912208286 • 10“ 4 . 

The initial values for the sun are taken as </o(0) = (0,0,0) T and qo(0) = (0,0,0) T . 
All other data (masses of the planets and the initial positions and initial veloci¬ 
ties) are given in Table 2.2. The initial data is taken from “Ahnerts Kalender fur 
Sternfreunde 1994”, Johann Ambrosius Barth Verlag 1993, and they correspond to 
September 5, 1994 at OhOO. 6 

5 100 million years is not much in astronomical time scales; it just goes back to “Jurassic 
Park”. 

6 We thank Alexander Ostermann, who provided us with this data. 
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Table 2.2. Data for the outer solar system 


planet 

mass 

initial position 

initial velocity 

Jupiter 

mi = 0.000954786104043 

-3.5023653 

-3.8169847 

-1.5507963 

0.00565429 

-0.00412490 

-0.00190589 

Saturn 

m 2 = 0.000285583733151 

9.0755314 

-3.0458353 

-1.6483708 

0.00168318 

0.00483525 

0.00192462 

Uranus 

m 3 = 0.0000437273164546 

8.3101420 

-16.2901086 

-7.2521278 

0.00354178 

0.00137102 

0.00055029 

Neptune 

iri4 = 0.0000517759138449 

11.4707666 

-25.7294829 

-10.8169456 

0.00288930 

0.00114527 

0.00039677 

Pluto 

m 5 = 1/(1.3 • 10 s ) 

-15.5387357 

-25.2225594 

-3.1902382 

0.00276725 

-0.00170702 

-0.00136504 


explicit Euler, h — 10 

implicit Euler, h — 10 


_ 


symplectic Euler, h — 100 

Stormer-Verlet, h = 200 



^^ 

^^ 


Fig. 2.4. Solutions of the outer solar system 


To this system we apply the explicit and implicit Euler methods with step size 
h = 10, the symplectic Euler and the Stormer-Verlet method with much larger 
step sizes h = 100 and h = 200, repectively, all over a time period of 200 000 
days. The numerical solution (see Fig. 2.4) behaves similarly to that for the Kepler 
problem. With the explicit Euler method the planets have increasing energy, they 
spiral outwards, Jupiter approaches Saturn which leaves the plane of the two-body 
motion. With the implicit Euler method the planets (first Jupiter and then Saturn) 
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fall into the sun and are thrown far away. Both the symplectic Euler method and 
the Stormer-Verlet scheme show the correct behaviour. An integration over a much 
longer time of say several million years does not deteriorate this behaviour. Let us 
remark that Sussman & Wisdom (1992) have integrated the outer solar system with 
special geometric integrators. 


1.3 The Henon-Heiles Model 

... because: (1) it is analytically simple; this makes the computation of 
the trajectories easy; (2) at the same time, it is sufficiently complicated to 
give trajectories which are far from trivial. (Henon & Heiles 1964) 

The Henon-Heiles model was created for describing stellar motion, followed for a 
very long time, inside the gravitational potential Uo (r, z) of a galaxy with cylindrical 
symmetry (Henon & Heiles 1964). Extensive numerical experimentations should 
help to answer the question, if there exists, besides the known invariants H and L, 
a third invariant. Despite endless tentatives of analytical calculations during many 
decades, such a formula had not been found. 

After a reduction of the dimension, a Hamiltonian in two degrees of freedom of 
the form 

H(p,q) = 2 (.Pi +P2) + U(q) (3.1) 

is obtained and the question is, if such an equation has a second invariant. Here, 
Henon and Heiles put aside the astronomical origin of the problem and choose 

U{q) = ^(<?i +ql) + - ^ql (3.2) 

(see citation). The potential U is represented in Fig. 3.1. When U approaches the 
level curves of V tend to an equilateral triangle, whose vertices are saddle points 
of U. The corresponding system 



Fig. 3.1. Potential of the Henon-Heiles Model and a solution 
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Fig. 3.2. Poincare cuts for qi = 0, pi > 0 of the Henon-Heiles Model for H — ^ (6 orbits, 
left) and H — | (1 orbit, right) 


-P760Q- 


Explicit Euler 
. h = IQ" 6 


-.4 b ■: 



P‘2 

Implicit Euler 

.4 


h = 10“f 
S. Ho = S 




f 


\?£p5P; 2 
, vP.Vu 

1 V, 





^2 -,P 0 










P ' 

-A 

’-"rVr f ' 



- in bold: Psooo, • • •, Ps 328 


: in bold: Pi,..., P 400 

Fig. 3.3. Poincare cuts for numerical methods, one orbit each; explicit Euler (left), implicit 
Euler (right). Same initial data as in Fig. 3.2 


q\ = -qi - 2<M2, h = ~q2 -ql + qi ( 3 - 3 ) 

has solutions with nontrivial properties. For given initial values with H(po,qo ) < | 
and go inside the triangle U <1 , the solution stays there and moves somehow like 
a mass point gliding on this surface (see Fig. 3.1, right). 

Poincare Cuts. We fix first the energy Hq and put gio = 0. Then for any point 
P 0 = (<720,^20), we obtain pio from (3.1) as p 10 = ^/ 2H 0 - 2 U 0 - P20’ where we 
choose the positive root. We then follow the solution until it hits again the surface 
q\ = 0 in the positive direction pi > 0 and obtain a point Pi = (g 2 i,P 2 i); in the 
same way we compute P 2 = (g 22 ? P 22 )> etc. For the same initial values as in Fig. 3.1 
and with P 0 = P, the solution for 0 < t < 300 000 gives 46 865 Poincare cuts 
which are all displayed in Fig. 3.2 (left). They seem to lie exactly on a curve, as do 
the orbits for 5 other choices of initial values. This picture thus shows “convincing 
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Fig. 3.4. Global error of numerical methods for nearly quasiperiodic and for chaotic solutions; 
same initial data as in Fig. 3.2 


evidence” for the existence of a second invariant, for which Gustavson (1966) has 
derived a formal expansion, whose first terms represent perfectly these curves. 

“But here comes the surprise” (Henon-Heiles, p. 76): Fig. 3.2 shows to the right 
the same picture in the (#2,^2) plane for a somewhat higher Energy H = The 
motion turns completely to chaos and all hope for a second invariant disappears. 
Actually, Gustavson’s series does not converge. 

Numerical Experiments. We now apply numerical methods, the explicit Euler 
method to the low energy initial values H = ^ (Fig. 3.3, left), and the implicit 
Euler method to the high energy initial values (Fig. 3.3, right), both methods with a 
very small step size h = 10 _5 . As we already expect from our previous experiences, 
the explicit Euler method tends to increase the energy and turns order into chaos, 
while the implicit Euler method tends to decrease it and turns chaos into order. The 
Stormer-Verlet method (not shown) behaves as the exact solution even for step sizes 
as large as h = 10“ > 

In our next experiment we study the global error (see Fig. 3.4), once for the case 
of the nearly quasiperiodic orbit (H = ^) and once for the chaotic one (H = |), 
both for the explicit Euler, the symplectic Euler, and the Stormer-Verlet scheme. 
It may come as a surprise, that only in the first case we have the same behaviour 
(linear or quadratic growth) as in Fig. 2.3 for the Kepler problem. In the second case 
(H = |) the global error grows exponentially for all methods, and the explicit Euler 
method is worst. 

Study of a Mapping. The passage from a point Pi to the next one P^ + 1 (as ex¬ 
plained for the left picture of Fig. 3.2) can be considered as a mapping Pi 
Pi+i and the sequence of points Po, Pi, P2 ,... are just the iterates of this mapping. 
This mapping is represented for the two energy levels H = ^ and H = | in 
Fig. 3.5 and its study allows to better understand the behaviour of the orbits. We see 
no significant difference between the two cases, simply for larger H the deforma¬ 
tions are more violent and correspond to larger eigenvalues of the Jacobian of In 
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i = focus-type fixed point ) = saddle-type fixed point 

Fig. 3.5. The Poincare map ^ : Pq —> Pi for the Henon-Heiles Model 


both cases we have seven fixed points, which correspond to periodic solutions of the 
system (3.3). Four of them are stable and lie inside the white islands of Fig. 3.2. 


1.4 Molecular Dynamics 

We do not need exact classical trajectories to do this, but must lay great 
emphasis on energy conservation as being of primary importance for this 
reason. (M.P. Allen & D.J. Tildesley 1987) 

Molecular dynamics requires the solution of Hamiltonian systems (1.10), where the 
total energy is given by 



(4.1) 
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and Vij(r) are given potential functions. Here, qi and pi denote the positions and 
momenta of atoms and mi is the atomic mass of the zth atom. We remark that the 
outer solar system (2.12) is such an 7V-body system with Vij(r ) = —Gmiirij/r. In 
molecular dynamics the Lennard-Jones potential 


V«(r) = 4ey((^) 

“-(?)•) 



(4.2) 

is very popular (£ij and aij are suit¬ 
able constants depending on the atoms). 

l 




p 1 Lennard - Jones 


This potential has an absolute minimum 
at distance r = <Jij \/2. The force due to 

' . 




_^ ,_ 1 _X_ 

this potential strongly repels the atoms 
when they are closer than this value, 

and they attract each other when they : 

are farther away. 

\ 4 5 

6 


7 8 

Numerical Experiments with a Frozen Argon Crys¬ 
tal. As in Biesiadecki & Skeel (1993) we consider the 

© 

© 

© 

interaction of seven argon atoms in a plane, where six of 
them are arranged symmetrically around a centre atom. 

As a mathematical model we take the Hamiltonian (4.1) 

© 



© 

© 

with N = 7, rrti = m = 66.34 • 10 27 [kg], 


© 


£ij = £ = 119.8 k>B [J], &ij = cr = 0.341 [nm], 


where ks = 1.380658 • 10 -23 [J/K] is Boltzmann’s constant (see Allen & Tildesley 
(1987), page 21). As units for our calculations we take masses in [kg], distances in 
nanometers (1 [nm] = 10 -9 [m]), and times in nanoseconds (1 [nsec] = 10 -9 [sec]). 
Initial positions (in [nm]) and initial velocities (in [nm/nsec]) are given in Table 4.1. 
They are chosen such that neighbouring atoms have a distance that is close to the 
one with lowest potential energy, and such that the total momentum is zero and 
therefore the centre of gravity does not move. The energy at the initial position is 
H(p 0 ,qo)~ -1260.2 A* [J]. 

For computations in molecular dynamics one is usually not interested in the tra¬ 
jectories of the atoms, but one aims at macroscopic quantities such as temperature, 
pressure, internal energy, etc. Here we consider the total energy, given by the Hamil¬ 
tonian, and the temperature which can be calculated from the formula (see Allen & 

Table 4.1. Initial values for the simulation of a frozen argon crystal 


atom 

1 

2 

3 

4 

5 

6 

7 

position 

0.00 

0.02 

0.34 

0.36 

-0.02 

-0.35 

-0.31 

0.00 

0.39 

0.17 

-0.21 

-0.40 

-0.16 

0.21 

velocity 

-30 

50 

-70 

90 

80 

-40 

-80 

-20 

-90 

-60 

40 

90 

100 

-60 
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60 

30 


explicit Euler, h — 0.5 [fsec] 


symplectic Euler, h — 10 [fsec] 

-30 \ 

_ 60 : total energy 



_30 F symplectic Euler, h — 10 [fsec 


-60 \ 


temperature 


30 

0 


-30 

30 

0 


-30 

30 

0 


-30 

30 

0 


-30 


Verlet, h — 40 [fsec] 


! Verlet, h — 80 [fsec] 

- total energy 


Verlet, h— 10 [fsec] 



Fig. 4.1. Computed total energy and temperature of the argon crystal 


Tildesley (1987), page 46) 


1 N 

T =mr B <43) 

We apply the explicit and symplectic Euler methods and also the Verlet method 
to this problem. Observe that for a Hamiltonian such as (4.1) all three methods 
are explicit, and all of them need only one force evaluation per integration step. In 
Fig. 4.1 we present the numerical results of our experiments. The integrations are 
done over an interval of length 0.2 [nsec]. The step sizes are indicated in femtosec¬ 
onds (1 [fsec] = 10 -6 [nsec]). 

The two upper pictures show the values ( H(p n , q n ) — H (p 0 , Qo )) /as a func¬ 
tion of time t n = nh. For the exact solution, this value is precisely zero for all times. 
Similar to earlier experiments we see that the symplectic Euler method is qualita¬ 
tively correct, whereas the numerical solution of the explicit Euler method, although 
computed with a much smaller step size, is completely useless (see the citation at 
the beginning of this section). The Verlet method is qualitatively correct and gives 
much more accurate results than the symplectic Euler method (we shall see later 
that the Verlet method is of order 2). The two computations with the Verlet method 
show that the energy error decreases by a factor of 4 if the step size is reduced by a 
factor of 2 (second order convergence). 

The two lower pictures of Fig. 4.1 show the numerical values of the temperature 
difference T — To with T given by (4.3) and T 0 ~ 22.72 [K] (initial temperature). 
In contrast to the total energy, this is not an exact invariant, but for our problem it 
fluctuates around a constant value. The explicit Euler method gives wrong results, 
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but the symplectic Euler and the Verlet methods show the desired behaviour. This 
time a reduction of the step size does not reduce the amplitude of the oscillations, 
which indicates that the fluctuation of the exact temperature is of the same size. 


1.5 Highly Oscillatory Problems 

In this section we discuss a system with almost-harmonic high-frequency oscilla¬ 
tions. We show numerical phenomena of methods applied with step sizes that are 
not small compared to the period of the fastest oscillations. 


1.5.1 A Fermi-Pasta-Ulam Problem 

... dealing with the behavior of certain nonlinear physical systems where 
the non-linearity is introduced as a perturbation to a primarily linear prob¬ 
lem. The behavior of the systems is to be studied for times which are long 
compared to the characteristic periods of the corresponding linear prob¬ 
lems. (E. Fermi, J. Pasta, S. Ulam 1955) 

In the early 1950s MANIAC-I had just been completed and sat poised 
for an attack on significant problems. ... Fermi suggested that it would 
be highly instructive to integrate the equations of motion numerically for 
a judiciously chosen, one-dimensional, harmonic chain of mass points 
weakly perturbed by nonlinear forces. (J. Ford 1992) 

The problem of Fermi, Pasta & Ulam (1955) is a simple model for simulations in 
statistical mechanics which revealed highly unexpected dynamical behaviour. We 
consider a modification consisting of a chain of 2m mass points, connected with al¬ 
ternating soft nonlinear and stiff linear springs, and fixed at the end points (see Gal- 
gani, Giorgilli, Martinoli & Vanzini (1992) and Fig. 5.1). The variables q\ y ,.., q^m 



Fig. 5.1. Chain with alternating soft nonlinear and stiff linear springs 


(q 0 = g2m+i = 0) stand for the displacements of the mass points, and pi = qi for 
their velocities. The motion is described by a Hamiltonian system with total energy 

m 2 171 171 

H{p,q) = +P 2 i) + Y XA 2i - Q 2 i-if + - 92i) 4 , 

i= 1 i= 1 i =0 


where u is assumed to be large. It is quite natural to introduce the new variables 
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Fig. 5.2. Exchange of energy in the exact solution of the Fermi-Pasta-Ulam model. The 
picture to the right is an enlargement of the narrow rectangle in the left-hand picture 


*o,i = (q2i + <Z 2 i-i)/v / 2 , 

2/0, i = (P2i + P2i-l) / V%, 

where xo,i (i = 1,. .., m) represents a scaled displacement of the zth stiff spring, 
xi : i a scaled expansion (or compression) of the ith stiff spring, and yo,i,yi,i their 
velocities (or momenta). With this change of coordinates, the motion in the new 
variables is again described by a Hamiltonian system, with 

m 2 171 

H(y,x) = + 2/i,i) + + i(( x °. 1 ~ x ia) 4 + 

i= 1 i= 1 

rri—1 

T~ ^ ^ (^0,i+l ^l,i) H - (^0 ,m H - %1 ,ra) 

i —1 

( 5 . 2 ) 

Besides the fact that the equations of motion are Hamiltonian, so that the total energy 
is exactly conserved, they have a further interesting feature. Let 

lj(xij,yi,j) = \ (yl,j+u 2 x 2 hj ) ( 5 . 3 ) 

denote the energy of the jth stiff spring. It turns out that there is an exchange of 
energy between the stiff springs, but the total oscillatory energy I = Ii + ... + 
I m remains close to a constant value, in fact, l((x(t),y(t)) = /((x(0), t/(0)) + 
O(o; _1 ). For an illustration of this property, we choose m = 3 (as in Fig. 5 . 1 ), 
u = 50, 


Xl,i = (q 2 i - Q2i-l)/V2, 
2/1,i = {P2i -P2i-l)/VZ, 


*0,1 (0) = 1, 2/0,1 (0) = 1, *i,i(0) 2/1,1 (0) = 1, 

and zero for the remaining initial values. Fig.5.2 displays the energies 
of the stiff springs together with the total oscillatory energy I = I\ + / 2 + Is as a 
function of time. The solution has been computed very carefully with high accuracy, 
so that the displayed oscillations can be considered as exact. 
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1.5.2 Application of Classical Integrators 

Which of the methods of the foregoing sections produce qualitatively correct ap¬ 
proximations when the product of the step size h with the high frequency uj is rela¬ 
tively large? 

Linear Stability Analysis. To get an idea of the maximum admissible step size, 
we neglect the quartic term in the Hamiltonian (5.2), so that the differential equation 
splits into the two-dimensional problems = 0, xo,z = Vo,i and 

2/1,» = xi ti = yi,i. (5.4) 

Omitting the subscripts, the solution of (5.4) is 

f y(i) \ _ ( cos ^ — sincjfA / ?/(0) \ 

\cjx(t) J ysincjt cosc ut J \ujx(0) J 

The numerical solution of a one-step method applied to (5.4) yields 



and the eigenvalues A* of M(huj) determine the long-time behaviour of the numeri¬ 
cal solution. Stability (i.e., boundedness of the solution of (5.5)) requires the eigen¬ 
values to be less than or equal to one in modulus. For the explicit Euler method 
we have = 1 =t ihcj, so that the energy I n = (t/ 2 + cj 2 £ 2 )/2 increases as 
(1 + h 2 uo 2 ) n / 2 . For the implicit Euler method we have Ai ,2 = (1 =b ihu) -1 , and 
the energy decreases as (1 + h 2 o; 2 ) -n / 2 . For the implicit midpoint rule, the ma¬ 
trix M(hu) is orthogonal and therefore I n is exactly preserved for all h and for all 
times. Finally, for the symplectic Euler method and for the Stormer-Verlet scheme 
we have 


M(hcj) = 


1 

huj 


-huj 

h 2 


UJ 


M(hcj) = 


1 - 


h Z UJ Z 


hoj 

2 



respectively. For both matrices, the characteristic polynomial is A 2 — (2 — h 2 uj 2 ) A+l, 
so that the eigenvalues are of modulus one if and only if \hw\ < 2. 

Numerical Experiments. We apply several methods to the Fermi-Pasta-Ulam 
(FPU) problem, with u = 50 and initial data as given in Sect. 1.5.1. The explicit 
and implicit Euler methods give completely wrong solutions even for very small 
step sizes. Fig. 5.3 presents the numerical results for H, /, Ii , J 2 , Is obtained with 
the implicit midpoint rule, the symplectic Euler, and the Stormer-Verlet scheme. 
For the small step size h = 0.001 all methods give satisfactory results, although the 
energy exchange is not reproduced accurately over long times. The Hamiltonian H 
and the total oscillatory energy / are well conserved over much longer time inter¬ 
vals. The larger step size h = 0.03 has been chosen such that hcj = 1.5 is close 



24 


I. Examples and Numerical Experiments 



Fig. 5.3. Numerical solution for the FPU problem (5.2) with data as in Sect. 1.5.1, obtained 
with the implicit midpoint rule (left), symplectic Euler (middle), and Stormer-Verlet scheme 
(right); the upper pictures use h = 0.001, the lower pictures h = 0.03; the first four pictures 
show the Hamiltonian H — 0.8 and the oscillatory energies ii, h , h , /; the last two pictures 
only show I 2 and I 


to the stability limit of the symplectic Euler and the Stormer-Verlet methods. The 
values of H and I are still bounded over very long time intervals, but the oscillations 
do not represent the true behaviour. Moreover, the average value of I is no longer 
close to 1, as it is for the exact solution. These phenomena call for an explanation, 
and for numerical methods with an improved behaviour (see Chap. XIII). 


1.6 Exercises 

1. Show that the Lotka-Volterra problem (1.1) in logarithmic scale, i.e., by putting 
p = log u and q = log v, becomes a Hamiltonian system with the function (1.4) 
as Hamiltonian (see Fig. 6.1). 



Fig. 6.1. Area preservation in logarithmic scale of the Lotka-Volterra flow 
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2. Apply the symplectic Euler method (or the implicit midpoint rule) to problems 
such as 


(u\ _ ( (v-2)/v\ 
{vj-{(l-u)/uj' 


f u\ _ / u 2 v(v — 2 ) \ 

\v J — u) J 


with various initial conditions. Both problems have the same first integral (1.4) 
as the Lotka-Volterra problem and therefore their solutions are also periodic. 
Do the numerical solutions also show this behaviour? 

3. A general two-body problem (sun and planet) is given by the Hamiltonian 


H(p,p s ,q,qs) 


1 T 1 T GmnM 

WTJPSPS + P Tj-77, 

2 M 2 m \\q-qs\\ 


where qs,q E M 3 are the positions of the sun (mass M) and the planet (mass 
m ), ps,p E M 3 are their momenta, and G is the gravitational constant, 
a) Prove: in heliocentric coordinates Q := q — qs, the equations of motion are 


,5= - G(M+m) m- 


b) Prove that Q(t ) x Q(t)) = 0 , so that Q(t) stays for all times t in the 
plane E = {q ; d T q = 0}, where d = Q(0) x Q(0). 

Conclusion. The coordinates corresponding to a basis in E satisfy the two- 
dimensional equations ( 2 . 2 ). 

4. In polar coordinates, the two-body problem (2.2) becomes 

f=~V\r) with v(r) = E-l 


which is independent of ip. The angle p(t) can be obtained by simple integration 
from 99 (f) = Lo/r 2 (t). 

5. Compute the period of the solution of the Kepler problem (2.2) and deduce 
from the result Kepler’s “third law”. 

Hint. Comparing Kepler’s second law (2.6) with the area of the ellipse gives 
\L 0 T = abn. Then apply (2.7). The result is T = 27r(2|iJ 0 |)“ 3/2 = 2 tt a 3 / 2 . 

6 . Deduce Kepler’s first law from (2.2) by the elegant method of Laplace (1799). 
Hint. Multiplying (2.2) with (2.5) gives 



and after integration L 0 gi = y- + B, L 0 ^2 = — y + A, where A and B are 
integration constants. Then eliminate q\ and q^ by multiplying these equations 
by q 2 and —qi respectively and by subtracting them. The result is a quadratic 
equation in q\ and q>2 . 

7. Whatever the initial values for the Kepler problem are, 1 + 2HqLq > 0 holds. 
Hence, the value e is well defined by (2.9). 

Hint. Lq is the area of the parallelogram spanned by the vectors g(0) and q( 0). 
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8. Implementation of the Stormer-Verlet scheme. Explain why the use of the one- 
step formulation (1.17) is numerically more stable than that of the two-term 
recursion (1.15). 

9. Runge-Lenz-Pauli vector. Prove that the function 


A(p , q) = 



0 

0 

qiP2 ~ Q2 Pi 



is a first integral of the Kepler problem, i.e., A(p(t),q(t )) = Const along 
solutions of the problem. However, it is not a first integral of the perturbed 
Kepler problem of Exercise 12. 

10. Add a column to Table 2.1 which shows the long-time behaviour of the error in 
the Runge-Lenz-Pauli vector (see Exercise 9) for the various numerical inte¬ 
grators. 

11. For the Kepler problem, eliminate (pi , p 2 ) from the relations H (p, q) = Const , 
L(p, q) = Const and A(p, q) = Const. This gives a quadratic relation for 
(</i, < 72 ) and proves that the solution lies on an ellipse, a parabola, or on a hy¬ 
perbola. 

12. Study numerically the solution of the perturbed Kepler problem with Hamil¬ 
tonian 


H(pi,P2,qi,q2) 


\ (P 1 +P 2 ) 


1 

+ <& 


3\/(9i +<?f) 3 ’ 


where p is a positive or negative small num¬ 
ber. Among others, this problem describes 
the motion of a planet in the Schwarzschild 
potential for Einstein’s general relativity the¬ 
ory 7 . You will observe a precession of the 
perihelion, which, applied to the orbit of Mer¬ 
cury, represented the historically first verifi¬ 
cation of Einstein’s theory (see e.g., Birkhoff 
1923, p.261-264). 



The precession can also be expressed analytically: the equation for it = 1/r as 
a function of p, corresponding to (2.8), here becomes 


1 

u= d 


pu 


( 6 . 1 ) 


where d = Lq. Now compute the derivative of this solution with respect to p, 
at p = 0 and u = (1 + e cos(p — ip*))/d after one period t = 2 tt. This leads to 
r] = p(e/d 2 ) • 27 r sin p (see the small picture). Then, for small p , the precession 
after one period is 


Ap = 


27 rp 

~d ~' 


7 We are grateful to Prof. Ruth Durrer for helpful hints about this subject. 


( 6 . 2 ) 



Chapter II. 

Numerical Integrators 


After having seen in Chap. I some simple numerical methods and a variety of nu¬ 
merical phenomena that they exhibited, we now present more elaborate classes of 
numerical methods. We start with Runge-Kutta and collocation methods, and we 
introduce discontinuous collocation methods, which cover essentially all high-order 
implicit Runge-Kutta methods of interest. We then treat partitioned Runge-Kutta 
methods and Nystrom methods, which can be applied to partitioned problems such 
as Hamiltonian systems. Finally we present composition and splitting methods. 


II. 1 Runge-Kutta and Collocation Methods 




Fig. 1.1. Carl David Tolme Runge (left picture), born: 30 August 1856 in Bremen (Germany), 
died: 3 January 1927 in Gottingen (Germany). 

Wilhelm Martin Kutta (right picture), born: 3 November 1867 in Pitschen, Upper Silesia (now 
Byczyna, Poland), died: 25 December 1944 in Fiirstenfeldbruck (Germany) 


Runge-Kutta methods form an important class of methods for the integration of 
differential equations. A special subclass, the collocation methods, allows for a par¬ 
ticularly elegant access to order, symplecticity and continuous output. 
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II.l.l Runge-Kutta Methods 

In this section, we treat non-autonomous systems of first-order ordinary differential 
equations 

y = f{t,y), y(t 0 ) = yo■ (i-i) 

The integration of this equation gives y(t\) = yo + j,*' f(t, y(t)) dt, and replacing 
the integral by the trapezoidal rule, we obtain 

2/1 = vo + f(to,yo ) + f(h,yi))- (1-2) 

This is the implicit trapezoidal rule , which, in addition to its historical impor¬ 
tance for computations in partial differential equations (Crank-Nicolson) and in 
A-stability theory (Dahlquist), played a crucial role even earlier in the discovery of 
Runge-Kutta methods. It was the starting point of Runge (1895), who “predicted” 
the unknown y\ -value to the right by an Euler step, and obtained the first of the 
following formulas (the second being the analogous formula for the midpoint rule) 

h = f(to, yo ) fcr = f(to, 2 /o) 

h = f(t 0 + h,y 0 + hki) k 2 = f(t 0 + f, yo + ffci) (1.3) 

2/i = yo + § (ki + k 2 ) 2/i = 2/o + hk 2 - 

These methods have a nice geometric interpretation (which is illustrated in the first 
two pictures of Fig. 1.2 for a famous problem, the Riccati equation): they consist 
of polygonal lines, which assume the slopes prescribed by the differential equation 
evaluated at previous points. 

Idea ofHeun (1900) and Kutta (1901): compute several polygonal lines, each start¬ 
ing at yo and assuming the various slopes kj on portions of the integration interval, 
which are proportional to some given constants ; at the final point of each poly¬ 
gon evaluate a new slope k{. The last of these polygons, with constants bi , deter¬ 
mines the numerical solution yi (see the third picture of Fig. 1.2). This idea leads to 
the class of explicit Runge-Kutta methods, i.e., formula (1.4) below with = 0 
for i < j. 



Fig. 1.2. Runge-Kutta methods for y = t 2 + y 2 , yo — 0.46, h — 1; dotted: exact solution 
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Much more important for our purpose are implicit Runge-Kutta methods, intro¬ 
duced mainly in the work of Butcher (1963). 

Definition 1.1. Let 6^, (i, j = 1, ..., s) be real numbers and let c* = Uj=i a ij- 

An s-stage Runge-Kutta method is given by 


ki = / (tp + Cjh, yp + h ^ ajjkj j, i = 

(1-4) 

2/i = 2/o + h^bjkj. 

i= 1 


Here we allow a full matrix (a ^) of non-zero coefficients. In this case, the slopes 
ki can no longer be computed explicitly, and even do not necessarily exist. For ex¬ 
ample, for the problem set-up of Fig. 1.2 the implicit trapezoidal rule has no solu¬ 
tion. However, the implicit function theorem assures that, for sufficiently small h , 
the nonlinear system (1.4) for the values fci,..., k s has a locally unique solution 
close to ki ss f(t 0 ,y 0 ). 

Since Butcher’s work, the coefficients are usually displayed as follows: 


Cl 

an 

.. ai s 

C s 

a s i 

.. a ss 


h • 

.. b s 


Definition 1.2. A Runge-Kutta method (or a general one-step method) has order p, 
if for all sufficiently regular problems (1.1) the local error y\ — y (to + h) satisfies 

2/i - y(to + h)= 0(h p+1 ) as ft —► 0. 


To check the order of a Runge Kutta method, one has to compute the Taylor 
series expansions of y(to + h) and y\ around to h = 0. This leads to the following 
algebraic conditions for the coefficients for orders 1, 2, and 3: 


'%2 i bi = l for order 1; 
in addition V . = 1/2 for order 2; 

( 1 . 6 ) 

in addition JA bicf = 1/3 

and JA ■ biaijCj = 1/6 for order 3. 

For higher orders, however, this problem represented a great challenge in the first 
half of the 20th century. We shall present an elegant theory in Sect. III. 1 which 
allows order conditions to be derived. 

Among the methods seen up to now, the explicit and implicit Euler methods 


0 

1 

1 


1 

1 


(1.7) 
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are of order 1, the implicit trapezoidal and midpoint rules as well as both methods 
of Runge 

0 

1/2 1/2 1 1 
1 1/2 1/2 

are of order 2. The most successful methods during more than half a century were 
the 4th order methods of Kutta: 





II.1.2 Collocation Methods 

The high speed computing machines make it possible to enjoy the advan¬ 
tages of intricate methods. (P.C. Hammer & J.W. Hollingsworth 1955) 


Collocation methods for ordinary differential equa¬ 
tions have their origin, once again, in the implicit 
trapezoidal rule (1.2): Hammer & Hollingsworth 
(1955) discovered that this method can be interpreted 
as being generated by a quadratic function “which 
agrees in direction with that indicated by the differen¬ 
tial equation at two points” to and t\ (see the picture 
to the right). This idea allows one to “see much-used 
methods in a new light” and allows various general¬ 
izations (Guillou & Soule (1969), Wright (1970)). An interesting feature of collo¬ 
cation methods is that we not only get a discrete set of approximations, but also a 
continuous approximation to the solution. 

Definition 1.3. Let ci,..., c s be distinct real numbers (usually 0 < q < 1). The 
collocation polynomial u(t ) is a polynomial of degree 8 satisfying 



u(t 0 ) = y 0 

ii(t 0 + Cih) = f(t 0 + Cih,u(t 0 + Cih )), i = 1,... ,s, 


d-9) 


and the numerical solution of the collocation method is defined by yi = u(to + h). 


For 8 = 1, the polynomial has to be of the form u(t) = yo + (t — to)k with 


k = /(to + ci ft, y 0 + ftcift). 


We see that the explicit and implicit Euler methods and the midpoint rule are collo¬ 
cation methods with ci = 0, ci = 1 and ci = 1/2, respectively. 
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Fig. 1.3. Collocation solutions for the Lotka-Volterra problem (1.1.1); uo — 0.2, vo = 3.3; 
methods of order 2: four steps with h = 0.4; method of order 4: two steps with h = 0.8; 
dotted: exact solution 


For 8 = 2 and c\ = 0, = 1 we find, of 

course, the implicit trapezoidal rule. The choice of 
Hammer & Hollingsworth for the collocation points 
is ci ? 2 = l/2±v / 3/6, the Gaussian quadrature nodes 
(see the picture to the right). We will see that the cor¬ 
responding method is of order 4. 

In Fig. 1.3 we illustrate the collocation idea with 
these methods for the Lotka-Volterra problem (1.1.1). One can observe that, in spite 
of the extremely large step sizes, the methods are quite satisfactory. 

Theorem 1.4 (Guillou & Soule 1969, Wright 1970). The collocation method of 
Definition 1.3 is equivalent to the s-stage Runge-Kutta method (1.4) with coeffi¬ 
cients 

a ij = / £j(r)dr, bi = 

Jo 

where £{(r) is the Lagrange polynomial £i(r) = n z/ i(r - ci)/(ci - ci). 

Proof. Let u(t) be the collocation polynomial and define 


/ 


£i(r) dr , 


( 1 . 10 ) 



h := u(t 0 + Cih). 

By the Lagrange interpolation formula we have u(to + rh) = kj * and 

by integration we get 


u(t 0 + Cih ) = 2/q + h ^ kj 
j= i 


£j( r ) dr. 


Inserted into (1.9) this gives the first formula of the Runge-Kutta equation (1.4). 
Integration from 0 to 1 yields the second one. □ 
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The above proof can also be read in reverse order. This shows that a Runge- 
Kutta method with coefficients given by (1.10) can be interpreted as a collocation 
method. Since r k ~ x = Y^j=\ c j _1 ^j ( T ) for k w 1,.. * f s, the relations (1.10) are 
equivalent to the linear systems 


C(q): 

^2 a ij c j ~ p k — 1 
i=i 

B(p) ■ 

i= 1 


(l.H) 


with q = s and p = s. What is the order of a Runge-Kutta method whose coeffi¬ 
cients bi , ctij are determined in this way? 

Compared to the enormous difficulties that the first explorers had in constructing 
Runge-Kutta methods of orders 5 and 6, and also compared to the difficult algebraic 
proofs of the first papers of Butcher, the following general theorem and its proof, 
discovered in this form by Guillou & Soule (1969), are surprisingly simple. 

Theorem 1.5 (Superconvergence). If the condition B(p ) holds for some p > s, 
then the collocation method (Definition 1.3) has order p. This means that the collo¬ 
cation method has the same order as the underlying quadrature formula. 

Proof We consider the collocation polynomial u(t) as the solution of a perturbed 
differential equation 

u = /(£, u) + 5(f) (1.12) 

with defect 5(f) := u(t) — f(t,u(t )). Subtracting (1.1) from (1.12) we get after 
linearization that 

u(t) - y(t ) = (t, y(tfj (u{t) - y(tj) + 5(t) + r(t), (1.13) 

where, for to < t < to + h, the remainder r(t) is of size 0(\\u(t) — y(t)\\ 2 ) = 
0(h 2s+ 2 ) by Lemma 1.6 below. The variation of constants formula (see e.g., Hairer, 
Nprsett & Wanner (1993), p. 66) then yields 

rto~\-h 

yi-y(t 0 +h) = u(t 0 +h)-y(t 0 +h) = / R(t 0 +h,s) 

Jt 0 

where R(t,s) is the resolvent of the homogeneous part of the differential equa¬ 
tion (1.13), i.e., the solution of the matrix differential equation dR(t,s)/dt = 
A(t)R(t,s), R(s,s) = /, with A(t) = df/dy(t,y(t)). The integral over R(to + 
/i, s)r(s) gives a G(h 2s+?> ) contribution. The main idea now is to apply the quadra¬ 
ture formula (^, q)| = 1 to the integral over g(s) = R(t$ + h, s)8(s)\ because the 
defect 5(s) vanishes at the collocation points to + (kh for i = 1,,s, this gives 
zero as the numerical result. Thus, the integral is equal to the quadrature error, which 
is bounded by h p+1 times a bound of the pth derivative of the function g(s). This 
derivative is bounded independently of h, because by Lemma 1.6 all derivatives 
of the collocation polynomial are bounded uniformly as h —> 0. Since, anyway, 
p < 2s, we get y% — y(to + h) = 0(h p+1 ) from (1.14). □ 


^(s)-hr(s)^ ds, (1.14) 
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Lemma 1.6. The collocation polynomial u(t ) is an approximation of order s to the 
exact solution of (1.1) on the whole interval, i.e., 

\\u(t) - y(t)\\ < C ■ h s+1 for t£[to,t 0 + h\ (1-15) 


and for sufficiently small h. 

Moreover, the derivatives ofuft ) satisfy for t E [to, to + h\ 

||w (fc) (t) -y {k) (t)\\ < C-h s+1 ~ k for k = 0,...,s. 


Proof The collocation polynomial satisfies 

u(t 0 + rh) = yj(to + Cih, u(t 0 + Cih)j £i(r), 

i= 1 

while the exact solution of (1.1) satisfies 

s 

y(to +rh) = Tj(t 0 +Cjh,y(t 0 + c^h )) ifr) + h s E(r,h), 

i= 1 


where the interpolation error E(t, h ) is bounded by max tG [ to ,t Q +h\ ll^^ + 1 H^)ll/ <s - 
and its derivatives satisfy 


II E^\t, 


< 


l|y (s+1) WII 

tE[to,to+h] (S — k + 1)! 


This follows from the fact that, by Rolle’s theorem, the differentiated polynomial 
J2t= i f(t o + ^h, y(t 0 + cih)) if 1 \t) can be interpreted as the interpolation 
polynomial of h k ~ 1 y^ (to + rh) at s — k +1 points lying in [to, to + h\ . Integrating 
the difference of the above two equations gives 


g rr nr 

y(to+ rh) — u(to+ rh) = Afi J £i(cr) da + h s+1 J E(cr,h)dcr (1. 


16) 


with Afi = /(to + q/i, y(to + q/i)) — /(to + q/i, u(to + Cih )). Using a Lipschitz 
condition for /(t, y), this relation yields 

max \\y(t) — u(f)\\ < hCL max \\y (t) — u(t) II + Const • ti s+1 , 

t(z[to ,to~\-h] tE[to ,to~\-h] 


implying the statement (1.15) for sufficiently small h > 0. 

The proof of the second statement follows from 

h k (y {k) (t 0 + rh) - u^ k) (t 0 + rh )) =hJ2 A fi 4^ ( T ) + ^ £ (fc_1) ( r - h ) 

i= 1 


by using a Lipschitz condition for /(t, y) and the estimate (1.15). 


□ 
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II. 1.3 Gauss and Lobatto Collocation 

Gauss Methods. If we take ci,... ,c s as the zeros of the 8th shifted Legendre 
polynomial 

the interpolator quadrature formula has order p = 2s, and by Theorem 1.5, the 
Runge-Kutta (or collocation) method based on these nodes has the same order 2s. 
For s = 1 we obtain the implicit midpoint rule. The Runge-Kutta coefficients for 
8 = 2 (the method of Hammer & Hollingsworth 1955) and s = 3 are given in 
Table 1.1. The proof of the order properties for general s was a sensational result of 
Butcher (1964a). At that time these methods were considered, at least by the editors 
of Math, of Comput., to be purely academic without any practical value; 5 years 
later their A-stability was discovered, 12 years later their Instability, and 25 years 
later their symplecticity. Thus, of all the papers in issue No. 85 of Math, of Comput., 
the one most important to us is the one for which publication was the most difficult. 

Table 1.1. Gauss methods of order 4 and 6 


1 V3 

I — 1 

I — 1 

2 6 

4 4 6 

1 73 

1 V3 1 

2 + 6 

4 + 6 4 


1 1 


2 2 


1 y/lb 

5 

2 VTE 

5 

Vl5 

2 10 

36 

9 15 

36 

30 

1 

5 y/lb 

2 

5 

a/15 

2 

36 + 24 

9 

36 

24 

1 Vl5 

5 VIE 

2 VT5 


5 

2 + 10 

36 + 30 

9 + 15 


36 


5 

4 


5 


18 

9 


18 


Radau Methods. Radau quadrature formulas have the highest possible order, 
28 — 1, among quadrature formulas with either c\ = 0 or c s = 1. The correspond¬ 
ing collocation methods for c s = 1 are called Radau IIA methods. They play an 
important role in the integration of stiff differential equations (see Hairer & Wanner 
(1996), Sect. IV.8). However, they lack both symmetry and symplecticity, properties 
that will be the subjects of later chapters in this book. 

Lobatto IIIA Methods. Lobatto quadrature formulas have the highest possible or¬ 
der with ci = 0 and c s = 1. Under these conditions, the nodes must be the zeros 
of 
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(117) 

and the quadrature order is p = 2 s — 2. The corresponding collocation methods are 
called, for historical reasons, Lobatto III A methods. For s = 2we have the implicit 
trapezoidal rule. The coefficients for 8 = 3 and s = 4 are given in Table 1.2. 

Table 1.2. Lobatto III A methods of order 4 and 6 


0 

0 0 0 

1 

5 1 1 

2 

24 3 24 


1 2 1 

i 

6 3 6 


1 2 1 


6 3 6 


0 

0 

0 

0 

0 

5- V5 

11 + ^5 

25-V5 

25 - 13V5 

-1 + V5 

10 

120 

120 

120 

120 

5 +V5 

11-V5 

25 + 13a/5 

25 + a/5 

-1-C5 

10 

120 

120 

120 

120 

1 

1 

5 

5 

1 

12 

12 

12 

12 


1 

5 

5 

1 


12 

12 

12 

12 


II.1.4 Discontinuous Collocation Methods 

Collocation methods allow, as we have seen above, a very elegant proof of their 
order properties. By similar ideas, they also admit strikingly simple proofs for their 
A- and 5-stability as well as for symplecticity, our subject in Chap. VI. However, 
not all method classes are of collocation type. It is therefore interesting to define a 
modification of the collocation idea, which allows us to extend all the above proofs 
to much wider classes of methods. This definition will also lead, later, to important 
classes of partitioned methods. 

Definition 1.7. Let C 2 ,..., c s _i be distinct real 
numbers (usually 0 < q < 1 ), and let b\ , b s 
be two arbitrary real numbers. The correspond¬ 
ing discontinuous collocation method is then 
defined via a polynomial of degree s — 2 sat¬ 
isfying 

u(to) = yo - hbi(u(t 0 ) - f(t 0 ,u(t 0 ))) 

u(to + Cih) = f(t 0 + Cih,u(to + (kh)), i = 2,...,s-l, (1.18) 

Vi = u(ti) - hb s (u(ti) - f(t 1 ,u(ti))). 


hb i 
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The figure gives a geometric interpretation of the correction term in the first and 
third formulas of (1.18). The motivation for this definition will become clear in the 
proof of Theorem 1.9 below. Our first result shows that discontinuous collocation 
methods are equivalent to implicit Runge-Kutta methods. 

Theorem 1.8. The discontinuous collocation method of Definition 1.7 is equivalent 
to an s-stage Runge-Kutta method (1.4) with coefficients determined hy c\ = 0, 
c s - 1, and 

an=bi , a is = 0 for i = 1 ,.., s, 

C(s — 2) and B(s — 2), 
with the conditions C(q ) and B(p) of (1.11). 

Proof. As in the proof of Theorem 1.4 we put ki := u(to + cffi) (this time for 
i pi 2,..., s— 1), so that u(to +rh) = Ylj=\ kj -£j (r) by the Lagrange interpolation 
formula. Here, £j (r) corresponds to C 2 ,..., c s _i and is a polynomial of degree 8—3. 
By integration and using the definition of u(to) we get 


s —1 


i(t 0 + Cih) = u(t 0 ) + h 72 k o / £ j( T ) dr 


3= 2 


s — 1 


= yo + hbiki + h kj f / £j{r) dr — b\£ 3 

3=2 


with ki = f(yo)- Inserted into (1.18) this gives the first formula of the Runge-Kutta 
equation (1.4) with £j(r) dr — b\£j(fS). As for collocation methods, one 

checks that the a i3 are uniquely determined by the condition C(s — 2). The formula 
for yi is obtained similarly. □ 


Table 1.3. Survey of discontinuous collocation methods 


type 

characteristics 

prominent examples 

b 1 = o, b s = 0 

(s — 2)-stage collocation 

Gauss, Radau IIA, Lobatto IIIA 

bi = 0, b s 0 

(s — 1)-stage with ai s — 0 

methods of Butcher (1964b) 

bi 0, b s — 0 

(s — 1)-stage with an = b\ 

Radau IA, Lobatto IIIC 

bi 0, b s 0 

s-stage with an = b±, ai S — 0 

Lobatto IIIB 


If b\ =0 in Definition 1.7, the entire first column in the Runge-Kutta tableau 
vanishes, so that the first stage can be removed, which leads to an equivalent method 
with 8 — 1 stages. Similarly, if b s = 0, we can remove the last stage. Therefore, we 
have all classes of methods, which are “continuous” either to the left, or to the right, 
or on both sides, as special cases in our definition. 

In the case where b\ = b s = 0, the discontinuous collocation method (1.18) is 
equivalent to the (s — 2)-stage collocation method based on C 2 ,..., c s _ i (see Ta¬ 
ble 1.3). The methods with b s = 0 but bi 0, which include the Radau IA and 
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Table 1.4. Lobatto IIIB methods of order 4 and 6 






0 

1 

-l-y/5 

-1 + V5 

0 





12 

24 

24 


1 

1 


5-V5 

1 

25 + a/5 

25 - 13a/5 

n 

0 

6 

~ 6 

0 

10 

12 

120 

120 


1 

1 

1 

0 

5 + n/5 

1 

25 + 13a/5 

25-V5 

n 

2 

6 

3 

10 

12 

120 

120 

u 

1 

1 

5 

0 

1 

1 

11-^5 

11 + 75 

0 


6 

6 


12 

24 

24 


1 

2 

1 


1 

5 

5 

1 


6 

3 

6 


12 

12 

12 

12 


Lobatto IIIC methods, are of interest for the solution of stiff differential equations 
(Hairer & Wanner 1996). The methods with b% = 0 but b s 0, introduced by 
Butcher (1964a, 1964b), are of historical interest. They were thought to be compu¬ 
tationally attractive, because their last stage is explicit. In the context of geometric 
integration, much more important are methods for which both b\ 0 and b s 0. 

Lobatto IIIB Methods (Table 1.4). We consider the quadrature formulas whose 
nodes are the zeros of (1.17). We have c\ = 0 and c s = 1. Based on C 2 ,..., c s _i 
and bi , b s we consider the discontinuous collocation method. This class of meth¬ 
ods is called Lobatto IIIB (Ehle 1969), and it plays an important role in geometric 
integration in conjunction with the Lobatto IIIA methods of Sect. II. 1.3 (see Theo¬ 
rem IV.2.3 and Theorem VI.4.5). These methods are of order 2s — 2, as the following 
result shows. 

Theorem 1.9 (Superconvergence). The discontinuous collocation method of Def¬ 
inition 1.7 has the same order as the underlying quadrature formula. 

Proof. We follow the lines of the proof of Theorem 1.5. With the polynomial u(t) 
of Definition 1.7, and with the defect 

S(t) := u(t) - f(t,u(t )) 

we get (1.13) after linearization. The variation of constants formula then yields 

u(t 0 + h) - y(t 0 + h) = R(t 0 + h, t 0 ) (u(t 0 ) - yo) 

rto~\~h 

+ / R(fo + ft, s ) 

j to 

which corresponds to (1.14) if u(to) = yo. As a consequence of Lemma 1.10 below 
(with k = 0), the integral over R(to + h,s)r(s ) gives a (9 (/i 2s_1 ) contribution. 
Since the defect 5(to + C{h ) vanishes only for i = 2,.,., s — 1, an application of the 
quadrature formula to R(to + ft, s)S(s) yields hb\R(to + ft, to)S(to) + hb s S(to + ft) 
in addition to the quadrature error, which is 0(h p+1 ). Collecting terms suitably, we 
obtain 


^(s) + r(s)^ ds, 
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u(ti) - hb s 5(ti) - y{ti) = R{ti,t 0 ){u{tf) + hb 1 b{tf)-y ( f) 

+0(hP +1 ) + 0(h 2s ~ 1 ), 

which, after using the definitions of u(to) and u(ti), proves y\ —yiti ) = 0(h p+1 ) + 
0{h 2s ~ l ). □ 

Lemma 1.10. The polynomial u(t ) of the discontinuous collocation method (1.18) 
satisfies for t G [to, to + h\ and for sufficiently small h 

\\u {k) (t) -t/ (fc) (f)U < C-h s ^-- k for k = 0,. .. ,s — 2. 

Proof The proof is essentially the same as that for Lemma 1.6. In the formulas for 
u(to + rh) and y(to + rh), the sum has to be taken from i = 2 to i = s — 1. 
Moreover, all h s become h s ~ 2 . In (1.16) one has an additional term 

Vo ~u(t 0 ) = hbi (u(to) - f(t 0 ,u(t 0 ))), 

which, however, is just an interpolation error of size 0(h s ^ L ) and can be included 

in Const • h s_1 . □ 


II.2 Partitioned Runge-Kutta Methods 

Some interesting numerical methods introduced in Chap. I (symplectic Euler and 
the Stormer-Verlet method) do not belong to the class of Runge-Kutta methods. 
They are important examples of so-called partitioned Runge-Kutta methods. In this 
section we consider differential equations in the partitioned form 

V = f(y,z), z = g(y,z), (2.1) 

where y and z may be vectors of different dimensions. 

II.2.1 Definition and First Examples 

The idea is to take two different Runge-Kutta methods, and to treat the t/-variables 
with the first method (a^, bf), and the z-variables with the second method (a^-, bf). 

Definition 2.1. Let b h , a %3 and h L , d l3 be the coefficients of two Runge-Kutta meth¬ 
ods. A partitioned Runge-Kutta method for the solution of (2.1) is given by 

s s 

h = f(yo + hy2 a ijkj , Zp + h ajjtj ), 

3 = 1 3 = 1 

S S 

I'i 9 (ho H - h ^ CLijkj , zq H - h ^ dij^j ^ % 

3 = 1 3 = 1 

s s 

yi = yo + h^biki, z i = z o J rh s y^ j b i l i . 

i=l z—1 


( 2 . 2 ) 
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Methods of this type were originally proposed by Hofer in 1976 and by Griepen- 
trog in 1978 for problems with stiff and nonstiff parts (see Hairer, Nprsett & Wanner 
(1993), Sect. 11.15). Their importance for Hamiltonian systems (see the examples of 
Chap. I) has been discovered only in the last decade. 

An interesting example is the symplectic Euler method (1.1.9), where the im¬ 
plicit Euler method b\ = 1, an = lis combined with the explicit Euler method 
bi = l,an = 0. The Stormer-Verlet method (1.1.17) is of the form (2.2) with 
coefficients given in Table 2.1. 


Table 2.1. Stormer-Verlet as a partitioned Runge-Kutta method 


0 

0 

0 

1/2 

1/2 

0 

1 

1/2 

1/2 

1/2 

1/2 

0 


1/2 

1/2 


1/2 

1/2 


The theory of Runge-Kutta methods can be extended in a straightforward man¬ 
ner to partitioned methods. Since (2.2) is a one-step method = @h(yo, ^o)» 

the Definition 1.2 of the order applies directly. Considering problems y = f(y), 
z = g(z ) without any coupling terms, we see that the order of (2.2) cannot exceed 
min(p, p), where p and p are the orders of the two methods. 

Conditions for Order Two. Expanding the exact solution of (2.1) and the numer¬ 
ical solution (2.2) into Taylor series, we see that the method is of order 2 if the 
coupling conditions 


Ylij bi a ij — 1/2? 2 ij bi a ij — 1/2 (2.3) 

are satisfied in addition to the usual Runge-Kutta order conditions for order 2. The 
method of Table 2.1 satisfies these conditions, and it is therefore of order 2. We also 
remark that (2.3) is automatically satisfied by partitioned methods that are based on 
the same quadrature nodes, i.e., 


Ci = Ci for all i 


(2.4) 


where, as usual, q = a ij an d G = ^ij • 

Conditions for Order Three. The conditions for order three already become quite 
complicated, unless (2.4) is satisfied. In this case, we obtain the additional condi¬ 
tions 

J2ij bidijCj = 1/6, EijbidijCj = 1/6. (2.5) 

The order conditions for higher order will be discussed in Sect. III.2.2. It turns out 
that the number of coupling conditions increases very fast with order, and the proofs 
for high order are often very cumbersome. There is, however, a very elegant proof of 
the order for the partitioned method which is the most important one in connection 
with “geometric integration”, as we shall see now. 
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II.2.2 Lobatto IIIA-IIIB Pairs 

These methods generalize the Stormer-Verlet method to arbitrary order. Indeed, the 
left method of Table 2.1 is the trapezoidal rule, which is the Lobatto IIIA method 
with 5 = 2, and the method to the right is equivalent to the midpoint rule and, apart 
from the values of the c z , is the Lobatto IIIB method with 5 = 2. Sun (1993b) and 
Jay (1996) discovered that for general 5 the combination of the Lobatto IIIA and 
IIIB methods are suitable for Hamiltonian systems. The coefficients of the methods 
for 5 = 3 are given in Table 2.2. Using the idea of discontinuous collocation, we 
give a direct proof of the order for this pair of methods. 


Table 2.2. Coefficients of the 3-stage Lobatto IIIA-IIIB pair 


0 

0 

0 

0 

0 

1/6 

-1/6 

0 

1/2 

5/24 

1/3 

-1/24 

1/2 

1/6 

1/3 

0 

1 

1/6 

2/3 

1/6 

1 

1/6 

5/6 

0 


1/6 

2/3 

1/6 


1/6 

2/3 

1/6 


Theorem 2.2. The partitioned Runge-Kutta method composed of the s-stage Lo¬ 
batto IIIA and the s-stage Lobatto IIIB method, is of order 2 s — 2. 

Proof Let c\ = 0, C 2 ,..., c s _i, c s = 1 and &i,..., b s be the nodes and weights of 
the Lobatto quadrature. The partitioned Runge-Kutta method based on the Lobatto 
IIIA-IIIB pair can be interpreted as the discontinuous collocation method 


"(M = yo 

v(to) = z 0 - hbi (v(to) - g(u(t 0 ),v(t 0 ))) 
u(t 0 + Cih) = f(u(t 0 + Cih),v(t 0 + c t h )), 
v(t 0 + ) = g(u(t 0 + Cih), v(to + Cih)), 

Vi = u(ti) 

zi = v(ti) - hb s (v(ti) - g(u(ti),v(ti))), 

where u(t) and v(t) are polynomials of degree 5 and 5 — 2, respectively. This is seen 
as in the proofs of Theorem 1.4 and Theorem 1.8. The superconvergence (order 
25 — 2) is obtained with exactly the same proof as for Theorem 1.9, where the 
functions u(t) and y(t) have to be replaced with (u(t),v(t)) T and (y(t), z(t )) T , 
etc. Instead of Lemma 1.10 we use the estimates (for t E [to, to + h}) 

\\u( k \t) — y( k \t)\\ < c-h s ~ k for k = 0,..., 5, 

\\y( k \t) — z( k \t)\\ < c • h s ~ x ~ k for k = 0,... , s — 2, 


i = 1,..., 5 
i = 2,..., 5 — 1 


( 2 . 6 ) 


which can be proved by following the lines of the proofs of Lemma 1.6 and 
Lemma 1.10. □ 
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II.2.3 Nystrom Methods 

Da bis jetzt die direkte Anwendung der Rungeschen Methode auf den 
wichtigen Fall von Differentialgleichungen zweiter Ordnung nicht behan- 
delt war ... (E.J. Nystrom 1925) 

Second-order differential equations 

y = g(t,y,y) (2.7) 

form an important class of problems. Most of the differential equations in Chap. I 
are of this form (e.g., the Kepler problem, the outer solar system, problems in mole¬ 
cular dynamics). This is mainly due to Newton’s law that forces are proportional 
to second derivatives (acceleration). Introducing a new variable z = y for the first 
derivative, the problem (2.7) becomes equivalent to the partitioned system 

y = z, z = g(t,y,z). (2.8) 

A partitioned Runge-Kutta method (2.2) applied to this system yields 

k% — zq + h ^ ^ dij £j , 

3 = 1 

s s 

h = g(t 0 + Cih, y 0 + h^dijkj, z 0 + (2.9) 

3= 1 J =1 

s s 

Vl = yo + h'Ebiki, Z! = Zq + hy^bj£j. 

i=l i=1 

If we insert the formula for ki into the others, we obtain Definition 2.3 with 


s s 

Ciij ^ ^ <2jfc<2fcj i ^ ^ bkCLki- (2.10) 

k= 1 k= 1 

Definition 2.3. Let q , bi , aij and bi , be real coefficients. A Nystrom method for 
the solution of (2.7) is given by 

s s 

h = 9 (to + Cjh, y 0 + Cjhyp + h 2 a^tj , Vo + h a^lj), 

j=1 s j =1 (2.11) 

2/i = 2/o + hy 0 + h 2 yi = yo + 

i=1 i= 1 

For the important special case y = g(t,y), where the vector field does not de¬ 
pend on the velocity, the coefficients a^ need not be specified. A Nystrom method is 
of order p if y\ —y (to + h) = 0{h p+1 ) and iji — y(to + h) = 0{h p+1 ). It is not suf¬ 
ficient to consider yi alone. The order conditions will be discussed in Sect. III.2.3. 

Notice that the Stormer-Verlet scheme (1.1.17) is a Nystrom method for prob¬ 
lems of the form ij = g(t, y). We have 8 = 2, and the coefficients are c\ = 0, C 2 = 1, 
an = ai 2 = <222 = 0, <221 = 1/2, b\ = 1/2, b 2 = 0, and b\ = b 2 = 1/2. With 
9n+ 1/2 = 9 n + f^n+1/2 the Step ('/„ . 1/2--I/ 2 ) ^ (<?n+ l/ 2 ,V n+1/2 ) of (1.1.17) 
becomes a one-stage Nystrom method with ci = 1/2, an = 0, b\ = b\ = 1. 
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II.3 The Adjoint of a Method 

We shall see in Chap. V that symmetric numerical methods have many impor¬ 
tant properties. The key for understanding symmetry is the concept of the adjoint 
method. 

The flow (p t of an autonomous differential equation 

y = f(y), y(to) = yo (3.1) 

satisfies ipZ} = H • This property is not, in general, shared by the one-step map 
of a numerical method. An illustration is presented in the upper picture of 
Fig. 3.1 (a), where we see that the one-step map $h fo r the explicit Euler method 
is different from the inverse of @-h, which is the implicit Euler method. 

Definition 3.1. The adjoint method <L>* h of a method d>h is the inverse map of the 
original method with reversed time step —h, i.e., 

K := $~- h (3.2) 

(see Fig. 3.1 (b)). In other words, yi = @h(yo) is implicitly defined by <L_h{y l) = 
2/o- A method for which ^ = <L>h is called symmetric. 



The consideration of adjoint methods evolved independently from the study of 
symmetric integrators (Stetter (1973), p. 125, Wanner (1973)) and from the aim of 
constructing and analyzing stiff integrators from explicit ones (Cash (1975) calls 
them “the backward version” which were the first example of mono-implicit meth¬ 
ods and Scherer (1977) calls them “reflected methods”). 

The adjoint method satisfies the usual properties such as ($£)* = and ($h o 
&h)* = ° f° r an y two one-step methods <L>h and The implicit Euler 

method is the adjoint of the explicit Euler method. The implicit midpoint rule is 
symmetric (see the lower picture of Fig. 3.1 (a)), and the trapezoidal rule and the 
Stormer-Verlet method are also symmetric. 

The following theorem shows that the adjoint method has the same order as the 
original method, and, with a possible sign change, also the same leading error term. 

Theorem 3.2. Let <p t be the exact flow of (3.1) and let ^ be a one-step method of 
order p satisfying 
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&h(yo) = Mvo) + C(y 0 )h p+1 + <D(h p+2 )- (3.3) 

The adjoint method ^ then has the same order p and we have 

$* h (vo) = Mv o) + ( -l) P C(yo)h p+1 + 0(h p+2 )- (3.4) 

If the method is symmetric, its (maximal) order is even. 

Proof The idea of the proof is exhibited in drawing (c) of Fig. 3.1. From a given 
initial value yo we compute Ph(y o) and y\ = <T>* h (yf), whose difference e* is the 
local error of <T>* h . This error is then “projected back” by to become e. We see 
that — e is the local error of @-h, be., by hypothesis (3.3), 

e = (• -l) P C(Mvo))h p+1 + 0(h p+2 ). (3.5) 

Since Ph(y o) = yo + 0(h) and e = (/ + 0(h))e*, it follows that 

e* = (-l) p C(y 0 )h p+1 + 0(h p+2 ) 

which proves (3.4). The statement for symmetric methods is an immediate conse¬ 
quence of this result, because <T>h = <T>* h implies C(yf) = (— l) p C(yo) , and therefore 
C(yo) can be different from zero only for even p. □ 


II.4 Composition Methods 


The idea of composing methods has some tradition in several variants: composition 
of different Runge-Kutta methods with the same step size leading to the Butcher 
group, which is treated in Sect. III. 1.3; cyclic composition of multistep methods for 
breaking the “Dahlquist barrier” (see Stetter (1973), p. 216); composition of low 
order Runge-Kutta methods for increasing stability for stiff problems (Gentzsch & 
Schliiter (1978), Iserles (1984)). In the following, we consider the composition of a 
given basic one-step method (and, eventually, its adjoint method) with different step 
sizes. The aim is to increase the order while preserving some desirable properties 
of the basic method. This idea has mainly been developed in the papers of Suzuki 
(1990), Yoshida (1990), and McLachlan (1995). 

Let <T>h be a basic method and 71 ,..., y s real numbers. Then we call its compo¬ 
sition with step sizes 71 / 1 , 72 / 1 ,..., 7 8 h, i.e., 

&h = ^ 7 s h 0 • • • 0 ^ 71 /ij (4.1) 

the corresponding composition method (see Fig. 4.1 (a)). 

Theorem 4.1. Let <T>h be a one-step method of order p. If 


71 + ... + 7 S = 1 

M 1 + ... + M 1 = 0, 


(4.2) 


then the composition method (4.1) is at least of order p + 1. 
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Fig. 4.1. Composition of method <Ph with three step sizes 

Proof. The proof is presented in Fig. 4.1 (b) for s = 3. It is very similar to the proof 
of Theorem 3.2. By hypothesis 

ei = C(y 0 )-^ +1 h p+1 + 0(h p+2 ) 

e 2 = C( Vl ) • 7 V +1 + 0{h p+2 ) (4.3) 

e 3 = C(y 2 )-Y 3 +1 h p+1 +0(h p+2 ). 

We have, as before, Hi = yo + 0(h) and Ei = (/ + 0(h))ei for all i and obtain, for 

E 7i = 1. 

Mvo)-Mvo) = e 1 + e 2 + e 3 = C(yoM +1 +Y 2 +1 +l P 3 +1 )h p+1 + 0(h p+2 ) 

which shows that under conditions (4.2) the 0{h pJr1 )- term vanishes. □ 

Example 4.2 (The Triple Jump). Equations (4.2) have no real solution for odd p. 
Therefore, the order increase is only possible for even p. In this case, the smallest 
s which allows a solution is s = 3. We then have some freedom for solving the 
two equations. If we impose symmetry 71 = 73 , then we obtain (Creutz & Gocksch 
1989, Forest 1989, Suzuki 1990, Yoshida 1990) 

l 2 1 /(p+ 1 ) 

71 = 73 = 2 - 2V(p+i) ’ 72 = - 2-2V(p+i) ‘ (4 ' 4) 

This procedure can be repeated: we start with a symmetric method of order 2, apply 
(4.4) with p = 2 to obtain order 3; due to the symmetry of the 7 ’s this new method 
is in fact of order 4 (see Theorem 3.2). With this new method we repeat (4.4) with 
p = 4 and obtain a symmetric 9-stage composition method of order 6 , then with 
p = 6 a 27-stage symmetric composition method of order 8 , and so on. One obtains 
in this way any order, however, at the price of a terrible zig-zag of the step points 
(see Fig. 4.2). 



Fig. 4.2. The Triple Jump of order 4 and its iterates of orders 6 and 8 
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Example 4.3 (Suzuki’s Fractals). If one desires methods with smaller values of 
7 i, one has to increase 5 even more. For example, for s = 5 the best solution of 

(4.2) has the sign structure + 4-b + with 71 = 72 (see Exercise 7). This leads to 

(Suzuki 1990) 

1 41/O+1) 

7i = 72 = 74 = 75 = 4 _ 4l/(p+1) , 73 = - 4 _ 41/(P+1) • ( 4 -5) 

The repetition of this algorithm for p = 2,4,6,... leads to a fractal structure of the 
step points (see Fig. 4.3). 





Fig. 4.3. Suzuki’s “fractal” composition methods 


Composition with the Adjoint Method. If we replace the composition (4.1) by the 
more general formula 


*h = $a s h o o ... o $* 2h o $ aih o $* ih , (4.6) 

the condition for order p + 1 becomes, by using the result (3.4) and a similar proof 
as above, 

f3\ + Q^i + /?2 “b • • • “b /?s “b = 1 

(-l)P/3f +1 + < +1 + (-l) p /?2 +1 + • • • + (-l) p /^ +1 + < +1 = 0. (4 ' 7) 

This allows an order increase for odd p as well. In particular, we see at once the 
solution a\ = (3 1 = 1/2 for p = s = 1 , which turns every consistent one-step 
method of order 1 into a second-order symmetric method 

# h = * h/ 2 0$* hl2 . (4.8) 

Example 4.4. If is the explicit (resp. implicit) Euler method, then $7 in (4.8) 
becomes the implicit midpoint (resp. trapezoidal) rule. 

Example 4.5. In a second-order problem q = p, p = g(q), if @h is the sym- 
plectic Euler method, which discretizes q by the implicit Euler and p by the ex¬ 
plicit Euler method, then the composed method $7 in (4.8) is the Stormer-Verlet 
method (1.1.17). 
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A Numerical Example. To demonstrate the numerical performance of the above 
methods, we choose the Kepler problem (1.2.2) with e = 0.6 and the initial values 
from (1.2.11). As integration interval we choose [0, 7.5], a bit more than one revo¬ 
lution. The exact solution is obtained by carefully evaluating the integral (1.2.10), 
which gives 

<p = 8.67002632314281495159108828552, (4.9) 

with the help of which we compute r, p, r from (1.2.8) and (1.2.6). This gives 

<?i = -0.828164402690770818204757585370 
<72 = 0.778898095658635447081654480796 

pi = -0.856384715343395351524486215030 
p 2 = -0.160552150799838435254419104102 . 

As the basic method we use the Verlet scheme and compare in Fig. 4.4 the perfor¬ 
mances of the composition sequences of the Triple Jump (4.4) and those of Suzuki 
(4.5) for a large number of different equidistant basic step sizes and for orders 
p = 4, 6, 8,10,12. Each basic step is then divided into 3, 9, 27, 81, 243 respectively 
5,25,125,625,3125 composition steps and the maximal final error is compared 
with the total number of function evaluations in double logarithmic scales. For each 
method and order, all the points lie asymptotically on a straight line with slope —p. 
Therefore, theoretically, a higher order method will become superior when the pre¬ 
cision requirements become sufficiently high. But we see that for orders 10 and 12 
these “break even points” are far beyond any precision of practical interest, after 
some 40 or 50 digits. We also observe that the wild zig-zag of the Triple Jump (4.4) 
is a more serious handicap than the enormous number of small steps of the Suzuki 
sequence (4.5). 

For later reference we have also included, in black symbols, the results obtained 
by the two methods (V.3.11) and (V.3.13) of orders 6 and 8, respectively, which will 
be the outcome of a more elaborate order theory of Chap. III. 



Fig. 4.4. Numerical results of the Triple Jump and Suzuki step sequences (grey symbols) 
compared to optimal methods (black symbols) 
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II.5 Splitting Methods 

The splitting idea yields an approach that is completely different from Runge-Kutta 
methods. One decomposes the vector field into integrable pieces and treats them 
separately. 


^ / “v / / / 
.///// 



/W 


+ 


4 4 4 t 4 1 1 

M ( t I t 1 

mini 

i 4 ‘ rAi 1 t t 

n 4 1 i t 

i 4 4 I I 1 t t 

i 4 4 4 4 4 1 t 

t 4 4 4,4 4 t 1, 


Fig. 5.1. A splitting of a vector field 


We consider an arbitrary system y = f(y ) in M 
field is “split” as (see Fig. 5.1) 

y = f [1 \v) + f [2] (y)- 


\ and suppose that the vector 


(5.1) 


If then, by chance, the exact flows AA and AA of the systems y = A\y) and 
y = (y) can be calculated explicitly, we can, from a given initial value yo, first 

solve the first system to obtain a value yi/ 2 , and from this value integrate the second 
system to obtain yi . In this way we have introduced the numerical methods 


.9 y i 

[ 2 ] 

A 


= A 1] ° ^ 


c£— 

yo 


<Ph 


A 11 


y 1/2 


i 21 + /** 


(5.2) 


& 

yo 


where one is the adjoint of the other. These formulas are often called the Lie- 
Trotter splitting (Trotter 1959). By Taylor expansion we find that (Ah} ° Ph} ) (2/o) = 
Th(yo) + 0(h 2 ), so that both methods give approximations of order 1 to the solution 
of (5.1). Another idea is to use a symmetric version and put 


^[S] [ 1 ] [ 2 ] [ 1 ] 

n =< Ph/2 ot Ph° A/2 



(5.3) 


which is known as the Strang splitting 1 (Strang 1968), and sometimes as the 
Marchuk splitting (Marchuk 1968). By breaking up in (5.3) Ah} = { Ph}/2 ° 


1 The article Strang (1968) deals with spatial discretizations of partial differential equations 
such as ut = Au x + Bu y . There, the functions typically contain differences in only 
one spatial direction. 
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we see that the Strang splitting o ^h /2 composition of the Lie- 

Trotter method and its adjoint with halved step sizes. The Strang splitting formula 
is therefore symmetric and of order 2 (see (4.8)). 

Example 5.1 (The Symplectic Euler and the Stormer-Verlet Schemes). Sup¬ 
pose we have a Hamiltonian system with separable Hamiltonian H (p, q ) = T(p) + 
U(q). We consider this as the sum of two Hamiltonians, the first one depending only 
on p , the second one only on q. The corresponding Hamiltonian systems 


p = 0 
'/ = '/;,(/>) 


and 


can be solved without problem to yield 


P(t) = Po 

q(t) = q 0 + tT p (po) 


p = -UM 

q = 0 


pit) = Po -tU q (qo) 
q(t) = q 0 . 


(5.4) 


(5.5) 


Denoting the flows of these two systems by pf and p^, we see that the symplectic 
Euler method (1.1.9) is just the composition p^ Furthermore, the adjoint of 

the symplectic Euler method is p% and by Example 4.5 the Verlet scheme is 
Ph /2 0( Ph ° Ph/ 2 ’ Strang splitting (5.3). Anticipating the results of Chap. VI, the 
flows pT and p% are both symplectic transformations, and, since the composition of 
symplectic maps is again symplectic, this gives an elegant proof of the symplecticity 
of the “symplectic” Euler method and the Verlet scheme. 


General Splitting Procedure. In a similar way to the general idea of composi¬ 
tion methods (4.6), we can form with arbitrary coefficients ai, 6 i, a 2 ,..., a m , 
(where, eventually, a\ or b m , or both, are zero) 


*h = <PiZh ° <Pa 


[ 1 ] 


[ 2 ] 

o< PbLih°- 


[ 1 ] [ 2 ] [ 1 ] 

.. o (zr \ o J , o m \ 

• d2h >b\h ' a\h 


(5.6) 


and try to increase the order of the scheme by suitably determining the free coeffi¬ 
cients. An early contribution to this subject is the article of Ruth (1983), where, for 
the special case (5.4), a method (5.6) of order 3 with m = 3 is constructed. Forest 
& Ruth (1990) and Candy & Rozmus (1991) extend Ruth’s technique and construct 
methods of order 4. One of their methods is just (4.1) with 71 , 72,73 given by (4.4) 
(pm 2) and <P>h from (5.3). A systematic study of such methods started with the 
articles of Suzuki (1990, 1992) and Yoshida (1990). 

A close connection between the theories of splitting methods (5.6) and of com¬ 
position methods (4.6) was discovered by McLachlan (1995). Indeed, if we put 
fa = a\ and break up p^ h = P^lh 0 (g rou P property of the exact flow) 

where 07 is given in (5.8), further p\^ h = p\]\h 0 Palh an< ^ so on (°F Fig- 5.2), we 
see, using (5.2), that \Ph of (5.6) is identical with ^ of (4.6), where 

$h = 7 11 0 so that $*h = <P l ? ° V [ h ] - (5-7) 


A necessary and sufficient condition for the existence of and fa satisfying (5.8) 
is that = £ 6 *, which is the consistency condition anyway for method (5.6). 
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ai ft /?i 

b\ =i J3\ + OL\ 
«2 = + @2 
^2 = $2 + <^2 
<^3 = <^2 + /?3 
&3 = /?3 


Fig. 5.2. Equivalence of splitting and composition methods 


(5.8) 


Combining Exact and Numerical Flows. It may happen that the differential equa¬ 
tion y = f(y ) can be split according to (5.1), such that only the flow of, say, 
V = f^\y) can be computed exactly. If f^(y) constitutes the dominant part of 
the vector field, it is natural to search for integrators that exploit this information. 
The above interpretation of splitting methods as composition methods allows us to 
construct such integrators. We just consider 

*h = <P L 1] °4 2] , K = ^h ] *°^h (5-9) 

as the basis of the composition method (4.6). Here is the exact flow of y = 
f^(y), and is some first-order integrator applied to y — f^(y). Since ^ of 
(5.9) is consistent with (5.1), the resulting method (4.6) has the desired high order. 
It is given by 


#h=<P™ h O0'. 


[ 2 ] 






[ 2 ] , 
ocs-ih 


o O 

° ^/3i h ° r (3\h' 


(5.10) 


Notice that replacing Lpf^ with a low-order approximation in (5.6) would not 
retain the high order of the composition, because does not satisfy the group 
property. 

Splitting into More than Two Vector Fields. Consider a differential equation 

y = f [1] (y) + f [2] (y) + --- + f [N] (y), (5.11) 


where we assume that the flows ip^ of the individual problems y = /W (y) can 
be computed exactly. In this case there are many possibilities for extending (5.6) 
and for writing the method as a composition of (p[ ,.... This makes 
it difficult to find optimal compositions of high order. A simple and efficient way is 
to consider the first-order method 
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^ [1] [2] [N] 

together with its adjoint as the basis of the composition (4.6). Without any additional 
effort this yields splitting methods for (5.11) of arbitrary high order. 


II.6 Exercises 

1. Compute all collocation methods with 8 = 2 as a function of c\ and C 2 . Which 
of them are of order 3, which of order 4? 

2. Prove that the collocation solution plotted in the right picture of Fig. 1.3 is com¬ 
posed of arcs of parabolas. 

3. Let b\ = 64 = 1/8, C 2 = 1/3, C 3 = 2/3, and consider the corresponding 
discontinuous collocation method. Determine its order and find the coefficients 
of the equivalent Runge-Kutta method. 

4. Show that each of the symplectic Euler methods in (1.1.9) is the adjoint of the 
other. 

5. (Additive Runge-Kutta methods). Let bi, aij and b^aij be the coefficients of 
two Runge-Kutta methods. An additive Runge-Kutta method for the solution 

of y = / [l1 (: y ) + / [2] (y) is given by 

h = / [l1 (yo + h^2 + / [2] ( 

3 = 1 
s 

Vi = Vo +h'^2b i k i . 

i =1 

Show that this can be interpreted as a partitioned Runge-Kutta method (2.2) 
applied to 

y = f [1] (y) + f [2] (z), z = f [1] (y) + f [2] (z) 

with y(0) = z( 0) = yo. Notice that y(t) = z(t). 

6 . Let <£>h denote the Stormer-Verlet scheme, and consider the composition 

^k + ih 0 ^ 2k h 0 • • • 0 ^ 72 h 0 ^71 h 

with 71 = ... = 7 fe = 7 /C +2 = • • • = 72 /c+i- Compute 71 and 7^+1 such 
that the composition gives a method of order 4. For several differential equa¬ 
tions (pendulum, Kepler problem) study the global error of a constant step size 
implementation as a function of k. 

7. Consider the composition method (4.1) with s = 5, 75 = 71 , and 74 = 72 . 
Among the solutions of 

271 + 272+73 = 1, 271 + 27!+7I = 0 

find the one that minimizes 127 ^ + 27 I + 731 . 

Remark. This property motivates the choice of the 7 i in (4.5). 


Vo + hy^ j a i jk j ) 
j-1 



Chapter III. 

Order Conditions, Trees and B-Series 


In this chapter we present a compact theory of the order conditions of the meth¬ 
ods presented in Chap. II, in particular Runge-Kutta methods, partitioned Runge- 
Kutta methods, and composition methods by using the notion of rooted trees and 
B-series. These ideas lead to algebraic structures which have recently found inter¬ 
esting applications in quantum field theory. The chapter terminates with the Baker- 
Campbell-Hausdorff formula, which allows another access to the order properties 
of composition and splitting methods. 

Some parts of this chapter are rather short, but nevertheless self-contained. For 
more detailed presentations we refer to the monographs of Butcher (1987), of Hairer, 
Nprsett & Wanner (1993), and of Hairer & Wanner (1996). Readers mainly inter¬ 
ested in geometric properties of numerical integrators may continue with Chap¬ 
ters IV, V or VI before returning to the technically more difficult jungle of trees. 


111.1 Runge-Kutta Order Conditions and B-Series 

Even the standard notation has been found to be too heavy in dealing with 
fourth and higher order processes, .. . (R.H. Merson 1957) 

In this section we derive the order conditions of Runge-Kutta methods by com¬ 
paring the Taylor series of the exact solution of (1.1) with that of the numerical 
solution. The computation is much simplified, first by considering an autonomous 
system of equations (Gill 1951), and second, by the use of rooted trees (connected 
graphs without cycles and a distinguished vertex; Merson 1957). The theory has 
been developed by Butcher in the years 1963-72 (see Butcher (1987), Sect. 30) and 
by Hairer & Wanner in 1973-74 (see Hairer, Nprsett & Wanner (1993), Sections II.2 
and 11.12). Here we give new simplified proofs. 

111.1.1 Derivation of the Order Conditions 

We consider an autonomous problem 

y = f(y), y{to) = yo, (i-i) 

where / : M n —> M n is sufficiently differentiable. A problem y = f{t,y) can be 
brought into this form by appending the equation i = 1. We develop the subsequent 
theory in four steps. 
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Er sagte es klar und angenehm, 

was erstens, zweitens und drittens kam\ (W. Busch, Jobsiade 1872) 

First Step. We compute the higher derivatives of the solution y at the initial point 
to- For this, we have from (1.1) 

y (q) = 0 f{y)) {q ~ 1] (1.2) 

and compute the latter derivatives by using the chain rule, the product rule, the 
symmetry of partial derivatives, and the notation f'(y ) for the derivative as a linear 
map (the Jacobian), f"(y) the second derivative as a bilinear map and similarly for 
higher derivatives. This gives 

v = f(y ) 

y = f'(y)y (1-3) 

y {3) = f(y)(y,y) + f'(y)y 

y (4) = / w (y)(y,y,y) + 3f"(y)(y,y) + f (y) y {3) 

y (5) = f( 4 \y)(y,y,y,y) + 6f'"(y)(y,y,y) + 4:f"(y)(y {3) ,y) 

+3 f'(y)(y,y) + f(y) y (4) , 

and so on. The coefficients 3, 6,4, 3,... appearing in these expressions have a cer¬ 
tain combinatorial meaning (number of partitions of a set of q — 1 elements), but for 
the moment we need not know their values. 

Second Step. We insert in (1.3) recursively the computed derivatives y, y, ... into 
the right side of the subsequent formulas. This gives for the first few 

V = f 

V = ff (1.4) 

y {3) = /"(/,/)+ /77 

y {4) = /'"(/, 7 /) + 3/"(/7, /) + f) + f'f'f'f, 

where the arguments (y) have been suppressed. The expressions which appear in 
these formulas, denoted by F(r), will be called the elementary differentials. We 
represent each of them by a suitable graph r (a rooted tree) as follows: 

Each / becomes a vertex, a first derivative f becomes a 
vertex with one branch, and a /cth derivative becomes a 
vertex with k branches pointing upwards. The arguments of the 
^-linear mapping f^ k \y) correspond to trees that are attached 
on the upper ends of these branches. The tree to the right cor¬ 
responds to /"(/'/, /). Other trees are plotted in Table 1.1. In 
the above process, each insertion of an already known derivative 
consists of grafting the corresponding trees upon a new root as 
in Definition 1.1 below, and inserting the corresponding elementary differentials as 
arguments of f^ m \y) as in Definition 1.2. 
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Table 1.1. Trees, elementary differentials, and coefficients 


M 

T 

graph 

a(r) 

Fir) 

7 (r) 

4>(t) 

<t(t) 

i 

• 

• 

1 

f 

1 

Y,i b i 

1 

2 

[•] 

/ 

1 

f’f 

2 

J2ij bi a ij 

1 

3 

[•,•] 

V 

1 

/"(/,/) 

3 

S ijk bi a ij a ik 

2 

3 

[[•]] 

} 

1 

rrs 

6 

ijk bi a ij a jk 

1 

4 


V 

1 

/'"(/,/,/) 

4 

y bi&ij CLikCLil 

6 

4 

[[•].•] 


3 

/"(/'/,/) 

8 

5 ^ijkl bi a ij a ikUjl 

1 

4 

[[•,•]] 

(( 

1 

m/,/) 

12 

S ijkl biUijUjkCLjl 

2 

4 

Mi 

} 

1 

fff'f 

24 

Ttijkl biCLijCLjkCLkl 

1 


Definition 1.1 (Trees). The set of (rooted) trees T is recursively defined as follows: 

a) the graph • with only one vertex (called the root) belongs to T; 

b) if 7i,..., r m G T, then the graph obtained 
by grafting the roots of t\ ,..., r m to a new 
vertex also belongs to T. It is denoted by 

T ~ [TL 5 • • • 5 T m\ •> 

and the new vertex is the root of r. 


root 



We further denote by |r| the order of r (the number of vertices), and by a(r) the 
coefficients appearing in the formulas (1.4). We remark that some of the trees among 
T \,..., Tm may be equal and that r does not depend on the ordering of t\ ,..., r m . 
For example, we do not distinguish between [[ • ], • ] and [•,[•]]. 


Definition 1.2 (Elementary Differentials). For a tree r G T the elementary differ¬ 
ential is a mapping F(r) : M n —> M n , defined recursively by F( •)(y) = f(y ) 
and 

= f < ' m '>(y)^F(r 1 )(y),...,F(T m )(yj^ for r = [n,..., r m ]. 

Examples of these constructions and the corresponding coefficients are seen in 
Table 1.1. With these definitions, we obtain from (1.4): 

Theorem 1.3. The qth derivative of the exact solution is given by 

y {q) {to ) = a ( T ) F ( T )(yo)> (i-5) 

\r\=q 

where a(r) are positive integer coefficients. □ 
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Third Step. We now turn to the numerical solution of the Runge-Kutta method 
(II. 1.4), which, by putting hki = we write as 

9i = hf(ui) ( 1 . 6 ) 


and 


Uj = Vo + ajj 9j, Vi = Vo + bj g it (1.7) 

3 i 

where Ui, gi and yi are functions of h. We develop the derivatives of (1.6), by 
Leibniz’ rule, and obtain g\ q ^ = h(f(ui ))^ + q • (/(t^))^ -1 ). This gives, for 
h = 0 , 

g\ q) =q-(f(ui)) {q - 1) , ( 1 . 8 ) 

the same expression as in ( 1 . 2 ), with y just replaced by Ui and with an extra factor 
q. Consequently, exactly as in (1.3), 


9% = 1 • /(Mi) 

9i = 2-/ , (y 0 )«i (1.9) 

9? ] = 3 -+ f(y 0 )ui) 

9 i 4) = 4- +^f'{yo)(ui,Ui) + f(y 0 )u[ 3) ) 

gf ] = 5- {f (A) {.yo){ui,Ui,Ui,Ui) +6f'"(yo)(ui,Ui,Ui) + 4/"(y 0 )(Mf m,) 

+ 3/"(yo)(«<,«i) + f'(y o)m- 4) ), 

and so on. Here, the derivatives of ^ and Ui are evaluated at h = 0. 

Fourth Step. We now insert recursively the derivatives u h , Hi, ... into (1.9). This 
will give the next higher derivative of g^ and, using 

U i q) ='52 a ij ' 9j q) ’ ( L1 °) 

3 

which follows from (1.7), also the next higher derivative of U{. This process begins 
as 


9% = 1 • / Ui = 1 • ■ f 

9i = { 1 • 2) f «* = (!• 2) (E, fc <w)/7 (1 11) 

and so on. If we compare these formulas with the first lines of (1.4), we see that the 
results are precisely the same, apart from the extra factors. We denote the integer 
factors 1 , 1 - 2 ,... by 7 (r) and the factors containing the a^’s by g \{r) and u^(r), 
respectively. We obtain by induction that the same happens in general, i.e. that, in 
contrast to (1.5), 
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9i q) Lo = 51 7( T ) ' Si ( T )' a ( r ) F ( r )(yo) 

\ T \=q 

u i q) \h=o = 55 i(t) ■ Ui(T) ■ a(r) F(T)(yo), 

\r\=q 


( 1 . 12 ) 


where a(r) and F(r) are the same quantities as before. This is seen by continuing 
the insertion process of the derivatives * 4 ” into the right-hand side of (1.9). For 
example, if U{ and ui are inserted into 3f"(ui, iif), we will obtain the corresponding 
expression as in (1.4), multiplied by the two extra factors ui( /), brought in by Hi, 
and ui( •) from iii. For a general tree r = [ti, ..., r m ] this will be 


g i(r) = Uj(ri) • ... • u i(r m ) . 


(1.13) 


Second, the factors "/(/) and 7 ( •) will receive the additional factor q = |r| from 
(1.9), i.e., we will have in general 


7 (r) = |r|7(ri) • ... -7(r m ). 


0-14) 


Then, by (1.10), 

u *( r ) = = 5Z a »J ' u i( T i) • • • • • u i( r m)- (1-15) 

J J 

This formula can be re-used repeatedly, as long as some of the trees n,..., r m are 
of order > 1. Finally, we have from the last formula of (1.7), that the coefficients for 
the numerical solution, which we denote by (j>{r) and call the elementary weights , 
satisfy 

^( t ) = 5Z 6igi ( r )- (L16) 

i 

We summarize the result as follows: 

Theorem 1.4. The derivatives of the numerical solution of a Runge-Kutta method 
(II. 1.4), for h = 0, are given by 

y[ q) | fc=0 = ^(t) • 4>{t) ■ a(r) F(r)(y 0 ), (1.17) 

h\=q 


where a(r) and F(r) are the same as in Theorem 1.3, the coefficients 7 (r) satisfy 
7 ( •) = 1 and (1.14). The elementary weights f(r) are obtained from the tree r as 
follows: attach to every vertex a summation letter (“i” to the root), then 4>(r) is the 
sum, over all summation indices, of a product composed ofbi, and factors a^for 
each vertex “j ” directly connected with “k ” by an upwards directed branch. 

Proof. Repeated application of (1.15) followed by (1.16) shows that the elementary 
weight f(r) is the collection of bi from (1.16) and all aij of (1.15). □ 
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Theorem 1.5. The Runge-Kutta method has order p if and only if 

(j){r) = f— for |t| < p. (1.18) 

7 (t) 

Proof The comparison of Theorem 1.3 with Theorem 1.4 proves the sufficiency 
of condition (1.18). The necessity of (1.18) follows from the independence of the 
elementary differentials (see e.g., Hairer, Nprsett & Wanner (1993), Exercise 4 of 
Sect. II.2). □ 

Example 1.6. For the following tree of order 9 we have 

i,j,k,l,m,n,p,q,r 

or, by using £7 a tJ = c it 

^ ' hi Ci a tJ CjOjj,-('},■ (ikjCi 270* 

i,j,k,l 

The quantities and 7 (r) for all trees up to order 4 are given in Table 1.1. This 
also verifies the formulas (II. 1.6) stated previously. 

III. 1.2 B-Series 

We now introduce the concept of B-series, which gives further insight into the be¬ 
haviour of numerical methods and allows extensions to more general classes of 
methods. 

Motivated by formulas (1.12) and (1.17) above, we consider the corresponding 
series as the objects of our study. This means, we study power series in /J r l contain¬ 
ing elementary differentials F{r) and arbitrary coefficients which are now written in 
the form a(r). Such series will be called B-series. To move from (1.6) to (1.13) we 
need to prove a result stating that a B-series inserted into hf( •) is again a B-series. 
We start with 

B{a,y) = y + a(-)hf(y) + + ... = y + 6, (1.19) 

and get by Taylor expansion 

hf(B(a, y )) = hf(y + S) = hf(y) + hf(y)S + 6) + ... . (1.20) 

Inserting S from (1.19) and multiplying out, we obtain the expression 

hf(B(a,y)) = hf + h 2 a(-)f f + h 3 a(/)f f f + ^a(') 2 f"(f, f) 

+/i 4 a( • )a(/)/"(/'/, /) +- 



d-21) 



III. 1 Runge-Kutta Order Conditions and B-Series 


57 


This beautiful formula is not yet perfect for two reasons. First, there is a denominator 
2! in the fourth term. The origin of this lies in the symmetry of the tree . We 
thus introduce the symmetry coefficients of Definition 1.7 (following Butcher 1987, 
Theorem 144A). Second, there is no first term y. We therefore allow the factor a(0) 
in Definition 1.8. 

Definition 1.7 (Symmetry coefficients). The symmetry coefficients ct(t) are de¬ 
fined by <r( •) = 1 and, for r = [ti, ..., r m ], 

(j(r) = <t(ti) • ... • a(r m ) • yi\y 2 \ • ... , (1.22) 

where the integers /i 2 ? • • • count equal trees among ri,..., r m . 

Definition 1.8 (B-Series). For a mapping a : T U {0} —► M a formal series of the 
form 

B(a, y) = a(0)y + X a(r) F(r)(y) (1.23) 

is called a B-series} 

The main results of the theory of B-series have their origin in the paper of 
Butcher (1972), although series expansions were not used there. B-series were then 
introduced by Hairer & Wanner (1974). The normalization used in Definition 1.8 
is due to Butcher & Sanz-Sema (1996). The following fundamental lemma gives a 
second way of finding the order conditions. 

Lemma 1.9. Let a : T U {0} —> M be a mapping satisfying a(0) = 1. Then the 
corresponding B-series inserted into hf(-) is again a B-series. That is 

hf(j3(a,y)j = B(a',y), (1.24) 

where a'(0) = 0, a'( •) = 1, and 

a'(r) = a(ri) • ... • a(r m ) for r = [n,..., r m ]. (1.25) 

Proof. Since a(0) = 1 we have B(a, y) = y + 0(h), so that hf{B{a , t/)) can be 
expanded into a Taylor series around y. As in formulas (1.20) and (1.21), we get 


1 In this section we are not concerned about the convergence of the series. We shall see 
later in Chap. IX that the series converges for sufficiently small h, if a(r) satisfies an 
inequality \a(r)\ < 7 (r)cd^ and if f(y) is an analytic function. If f(y) is only /c-times 
differentiable, then all formulas of this section remain valid for the truncated B-series 
\r\<k '/' with a suitable remainder term of size 0{h k+1 ) added. 
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hf(B(a,y)) = h ^ -^f (m \y)(^B(a,y)-yj 


m> 0 


= ,l L^L'" L - Z7Z \ • «( r l) • • • • • a ( T ™) 


^n m! ^ cr(ri) • ... • cr(r m ) 

m>0 tiGT t to GT v 7 v 7 


= E £•••£; 

ra>0 riGT r m GT 


•/( m )(y)(F(r 1 )(y),...,F(7 

/il T l >ui!/ u 2 ! • 


r(r) m! 


: • a'(r)F(r)( 2 /) 
with t = [ti, ..., T m ] 


= H 4zT a '( r ) F ( T )(2/) = S ( a '> 2 /)- 


rGT 


a(r) 


The last equality follows from the fact that there are ( ™ ) possibilities for writ¬ 

ing the tree r in the form r = [n,..., r m \. For example, the trees [ •, •,[•]], 
[•,[•], • ] and [[ • ], •, • ] appear as different terms in the upper sum, but only as 
one term in the lower sum. □ 


Back to the Order Conditions. We present now a new derivation of the order 
conditions that is solely based on B-series and on Lemma 1.9. Let a Runge-Kutta 
method, say formulas (1.6) and (1.7), be given. All quantities in the defining formu- 
las are set up as B-series, g% = B(gi,y 0 ), u i: = B(ui,y 0 ), y% = B(<j>,y 0 ). Then, 
either the linearity and/or Lemma 1.9, translate the formulas of the method into cor¬ 
responding formulas for the coefficients (1.13), (1.15), and (1.16). This recursively 
justifies the ansatz as B-series. 

Assuming the exact solution to be a B-series B(e,yo),a. term-by-term derivation 
of this series and an application of Lemma 1.9 to (1.1) yields 

e ( r ) = j^| e ( r i) • ••• - e(r m ). 

Together with definition (1.14) of 7 (r) we thus obtain 

e(r) = —L (1.26) 

7 (t) 

A comparison of the coefficients of the B-series yi = B(ip, yo) with those of the 
exact solution gives (1.18) and proves Theorem 1.5 again . 

Comparing the B-series B(e,yo) for the exact solution with Theorem 1.3, we 
get as a byproduct the formula 


a(r) 


o-(t) -7(t)‘ 


d-27) 


If the available tools are enriched by the more general composition law of Theo¬ 
rem 1.10 below, this procedure can be applied to yet larger classes of methods. 
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III. 1.3 Composition of Methods 


The order theory for the composition of methods goes back to 1969, when Butcher 
used it to circumvent the order barrier for explicit 5th order 5 stage methods. It led to 
the seminal publication of Butcher (1972), where the general composition formula 
in (1.34) was expressed recursively. 

Composition of Runge-Kutta Methods. Suppose that, starting from an initial 
value yo , we compute a numerical solution y\ using a Runge-Kutta method with 
coefficients a^-, bi and step size h. Then, continuing from yi, we compute a value 
t /2 using another method with coefficients a*-, 6* and the same step size. This com¬ 
position of two methods is now considered as a single method (with coefficients 
aij,bi). The problem is to derive the order properties of this new method, in par¬ 
ticular to express the elementary weights 0(r) in terms of those of the original two 
methods. 

If the value y\ from the first method is inserted into the starting value for the 
second method, one sees that the coefficients of the combined method are given by 
(here written for two-stage methods) 


an 

CL 12 


O' 21 

U22 


U31 

U32 U33 

U34 

a 4 i 

U42 U43 

a 4 4 

bi 

&2 &3 
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an 

CL12 



U 21 

CL22 



&i 

b2 

°ii 

a 12 


t>2 

a 21 

a 22 

bi 


bt 

b*2 


and our problem is to compute the elementary weights of this scheme. 

Derivation. The idea is to write the sum for </>(r), say for the tree in full detail 


t 4 4 4 4 

(Y) = EEEE' 

i= 1 j =1 k=1 1=1 


bi CLij Ciik CLki 


and to split each sum into the two different index sets. This leads to 2' r ' dif- 
ferent expressions £i=i £k=i £?= 1 •/• + Eta £?=i £*=i £?=i •/• + 

^2 i= i Ej =3 E/c=i Ez= i •/• +_We symbolize each expression by drawing the 

corresponding vertex of r as a bullet for the first index set and as a star for the sec¬ 
ond. However, due to the zero pattern in the matrix in (1.28) (the upper right comer 
is missing), each term with “star above bullet” can be omitted, since the correspond¬ 
ing CLij’ s are zero. So the only combinations to be considered are those of Fig. 1.1. 
We finally insert the quantities from the right tableau in (1.28), 

) = E bi aij CLik CLki + E bj bk CLki + ^b* a\- bk aki + E b* bj a* k bi 

+ E K <4 bi + E K *3 <4 <4 + E K <4 <4 a* M , 
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\ \ *! 

•J \k *1 \k kk kk 

V ¥ V 


\ / 

\ / 

Fig. 1.1. Combinations with nonzero product 



and we observe that each factor of the type bj interrupts the summation, so that the 
terms decompose into factors of elementary weights of the individual methods as 
follows: 

k'b) = +<{>*(-)■ h-w/)+ rm ■ h/)+ ro ■ 

+r(\n-</>(•)+</>•(})■</>(•)+</>•($)■ 

The trees composed of the “star” nodes of r in Fig. 1.1 constitute all possible “sub¬ 
trees” 6 (from the empty tree to r itself) having the same root as r. This is the key 
for understanding the general result. 

Ordered Trees. In order to formalize the procedure of Fig. 1.1, we introduce the 
set OT of ordered trees recursively as follows: • G OT , and 

if cji, ..., o; m G OT , then also the ordered ra-tuple (u i,..., o; m ) G OT. (1.30) 

As the name suggests, in the graphical representation of an ordered tree the order of 
the branches leaving cannot be permuted. Neglecting the ordering, a tree r G T can 
be considered as an equivalence class of ordered trees, denoted r = UJ. 

For example, the tree of Fig. 1.1 has two orderings, namely and . We 
denote by v(t) the number of possible orderings of the tree r. It is given by v( •) = 
1 and 

m \ 

v(t) = —-— - v(ti )-...-v(T m ) (1.31) 

t^l •/t'2 • 

for r = [ri,..., r m ], where the integers /i i, • • • are the numbers of equal trees 
among ri,... , r m . This number is closely related to the symmetry coefficient <t(t), 
because the product n(r) = g{t)v{t) satisfies the recurrence relation 

k(t) = ml k(ti) • ... • /c(r m ). (1.32) 

We introduce the set OST(u;) of ordered subtrees of an ordered tree uj G OT by 

OST(.) = {0,.} (1.33) 

OST(uj) = {0} U {(0i,..,, 0 m ) ; 6i G OST(u)i)} for cj = (u u ... 

Each ordered subtree 6 G OST(uj) is naturally associated with a tree 9 G T obtained 
by neglecting the ordering and the 0-components of 6. For every tree r G T we 
choose, once and for all, an ordering. We denote this ordered tree by cj(t), and we 
put OST(t) = OST(uj(t)). 
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For the tree of Fig. 1.1, considered as an ordered tree, the ordered subtrees cor¬ 
respond to the trees composed of the “star” nodes. 

The General Rule. The general composition rule now be¬ 
comes visible: for 6 E OST(uj) we denote by u\9 the “for¬ 
est” collecting the trees left over when 0 has been removed 
from the ordered tree u. For brevity we set r\0 := uj(t) \0. 
With the conventions </>*(#) = </>*(#) and </>*(0) = 1 we 
then have 

Hr)= £ (no) -Hm)- ( 1 - 34 ) 

0eOST(r ) 8er\e 

This composition formula for the trees up to order 3 reads: 

£(•) =F(Q) ■<!>(•)+ <!>*(•) 

kf) = rm •</>(/) + </>*(•)■</>(•) +FU) 

V) — 0*(0) * 4 >( V) + 0*(•) * 0(• ) 2 + 2 </)*(/) • <K•) + 0*(V) 

?(}) = r( 0 )^(}) + r(*)^(/) + ^(/)^(*) + ^(}) 

The tree t = \f has the subtrees displayed in Fig. 1.2. It contains symmetries in that 
the third and fourth subtrees are topologically equivalent. This explains the factor 2 
in the expression for the elementary weight. 




Fig. 1.2. A tree with symmetry 


fk 



fk 



III. 1.4 Composition of B-Series 

We now extend the above composition law to general B-series, i.e., we insert the 
B-series themselves into each other, as sketched in Fig. 1.3. This allows us to gen¬ 
eralize Lemma 1.9 (because hf(y) is a special B-series). 


B(a, 2 /o) 


y o O 


Fig. 1.3. Composition of B-series 


B(b,y i) 


B(ab , 2/o) 


O V2 


We start with an observation of Murua (see, e.g., Murua & Sanz-Serna (1999), 
p. 1083), namely that the proof of Lemma 1.9 remains the same if the function hf(y) 
is replaced with any other function hg(y ); in this case (1.21) is replaced with 
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hg(B(a,y )) = hg + h 2 a(-)g'f + h 3 a{/)g'f f + L a (.) 2 g"(f, f) (1 35) 
+h 4 a( • f) + ... . 

Such series will reappear in Sect. III.3.1 below. Extending this idea further to, say, 
f"(y){v i, ^ 2 ), where V\,V 2 are two fixed vectors, we obtain 

hf"(B(a,y))(v u v 2 ) = hf"(y u v 2 ) + h 2 a( • )f'"(v 1 ,v 2 , f) (1.36) 

+ h 3 a(f)f"(v 1 ,V 2 ,f'f) + ^h 3 a(*) 2 f""(v 1 ,v 2 ,f,f) 

+ h 4 a( • )a( /)/""(«!, v 2 , f'f, /) + ..■• 

This idea will lead to a direct proof of the following theorem of Hairer & Wanner 
(1974). 

Theorem 1.10. Let a : TU{0} ^ M be a mapping satisfying a(0) = 1 and let 
b : T U {0} —> M be arbitrary. Then the B-series B(a,y) inserted into B(b , •) is 
again a B-series 

B(b,B(a,y))=B(ab,y), (1.37) 

where the group operation ab{r) is as in (1.34), i.e 

ab(r) = b(6)-a(r\6) with a(r\6) = JJ a(S). (1.38) 

0eOST(r ) 5er\e 

Proof, (a) In part (c) below we prove by induction on \d\, d G T that 

— -F{#)(B{a,y)} = ]T — a(r \ 9) F(r)(y), (1.39) 

' ' (r,0)eA(0) °^ ’ 

where 

A($) = {(t, 0) ; r G T, 0 G OST(r), 0 = $}. 

Multiplying (1.39) by b($) and summing over all $ G T yields the statement (1.37)- 
(1.38), because 

E E -/- = E E •/•• 

i?GT (r,0)eA(i?) tGT 0eOST(r) 

(b) Choosing a different ordering of r in the definition of OST(r ) yields the 
same sum in (1.39). Therefore (1.39) is equivalent to 


h)y 


F(#)(B(a,y)) 


E 


cr(o;)z/(c<;) 


a 




(1-40) 


where 

12(#) = {(cj,0) ; cj G Or, 0 G OSr(u;),0 = $}, 

and z/(t) is the number of orderings of the tree r, see (1.31). Functions defined on 
trees are naturally extended to ordered trees. In (1.40) we use \uj\ = \t\, a( lj) = 
<t(t), v(lo) = z/(r), a (a; \ 0) = a(r \ 0 ), and F(cj)( y) = F(r)(y) for lu = r. 
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(c) For $ = • and uo = (cji, ..., uj m ) we have a(u; \ 0) = a(cji) •... • a(a; m ) if 
6 =•. Since we have a one-to-one correspondence (cj, #) ^ cj between i?( •) and 
OT , and since the expression in the sum of (1.40) is independent of the ordering of 
l o, formula (1.40) is precisely Lemma 1.9. 

To prove (1.40) for a general tree $ = [$i, we apply the idea put for¬ 

ward in (1.36) to hfW (L>(a, y)) (vi, ..., vi) with fixed v% ,..., vu and obtain as in 
the proof of Lemma 1.9 


hf {l) (B(a,y))(v 1 ,... ,v t ) 


E 

m> 0 


1 

m! 


E-E 

tj+iGT ri +rn eT 


fo\ T l + l I + ••• + ! r Z + m 1+1 
<T(r i+ i) •... • 

m) 


•a(r (+ i) a(ri +m ) • f {l+m \y) («i,..., v h F(n+i)(y), ■ ■ ■, F(n +m )(y)j. 


Changing the sums over trees to sums over ordered trees we obtain 


hf ( - l) (B(a,y))(v 1 ,...,v l ) = ^ 

m> 0 


1 

ml 


E • E 

u>i-i-i£OT uJi +rn £OT 


K(uJl+ 1 ) • . . • • k{ui + m) 


• a(u>i+i) a(u)i +m ) ■ f (l+m \y) («i, ...,v u F(un +1 ){y),F(ui +m )(yfj. 


We insert Vj = (B(a, y )) into this relation, and we apply our induction 

hypothesis 


V (uj ,6^0(0 j) V 

We then use the recursive definitions of cr($) and F($)(y) on the left-hand side. On 
the right-hand side we use the multilinearity of /^ +m \ the recursive definitions of 

\u \ 9 k(lu), F(w)(y) for uj = (cji, ..., cj^ +m ), and the facts that 

a(cj \0) = a( ui \ 6i) • ... • a(ui \ 0/) • a{uj i+ 1 ) • ... • a((Ji+ m ) 


and 


E-E E - E •/• 

(o;i loi-\-\£OT toi+mEOT 


m!/ii!/i 2 ! • • 

(/ + m)! 


: E •/• 

(u;,0)Gf2z+m W 


where /ii,/i 2 ,... count equal trees among $i,... and i?/ +m ($) consists of 
those pairs (a;, 0) E i?($) for which uj is of the form uj = (cji, ... ,c^ +m ). The 
factorials appear, because to every (l + m )-tuple of the left-hand sum correspond 
(m »:; 2 ) e l emen t s i n obtained by permuting the order. This yields 

formula (i.40) and hence (1.39). □ 
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Example 1.11. The composition laws for the trees of order < 4 are 

ab(») = 6(0) • a(«) + &(•) 

ab(/) = 6(0) • a(/) + 6(•) • a(•) + 6(/) 

a6( V) = 6(0) • a( V) + 6(.) • a( • ) 2 + 26(/) • a(.) + 6( V) 
a6(})=6(0)-a(})+6(.)-a(/) + 6(/)-a(.)+6(}) 
a6( V) = &(0) • «(V) + 6(•) • a( • ) 3 + 36(/) • a( • ) 2 + 36(V) • a( •) 

+ KV) 

a6(^}) = 6(0) • a(\^) + 6( •) • a( • )a(/) + 6(/) • a(/) + 6(/) • a( • ) 2 

+ 6(V) • a(•) + 6(}) • a(*) + 6(\^) 

a6( Y) = 6(0) • a( Y) + 6( •) • a( V) + K /) • a ( * ) 2 + 2& (}) • a ( *) 

+ «Y) 

a6(^)=6(0)-a(^)+6(.)-a(}) + 6(/)-a(/) + 6(})-a(.)+6(^) 

Remark 1.12. The composition law (1.38) can alternatively be obtained from the 
corresponding formula (1.34) for Runge-Kutta methods by using the fact that B- 
series which represent Runge-Kutta methods are “dense” in the space of all B-series 
(see Theorem 306A of Butcher 1987). 

in. 1.5 The Butcher Group 



John C. Butcher, 

bom: 31 March 1933 in Auckland 
(New Zealand) 


The composition law (1.38) can be turned into 
a group operation , by introducing a unit ele¬ 
ment 

e(0) = 1, e(r) = 0 for r G T, (1.41) 

and by computing the inverse element of a 
given a. This is obtained recursively from 
the table of Example 1.11, by requiring 
aa _1 (r) = 0 and by inserting the previously 
known values of a -1 ($). This gives for the 
first orders 

a_1 ( •) = ""(•) 

a - 1 ( V )=-«( V ) + 2 a (/) a (.)-«(-) 3 
«- 1 (})-- a (}) + 2 a (/) a (.)- a (*) 3 

(1.42) 
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We can distinguish several realizations of this group: 

Grk the set of Runge-Kutta schemes with composition (1.28); 

Ggw the set of elementary weights of Runge-Kutta schemes with the composition 
law (1.34); 

Gtm the set of tree mappings a : T U {0} —► R satisfying a(0) = 1 with 
composition (1.38); 

Ggg the set of B-series (1.23) satisfying a(0) = 1 with composition (1.37). 

A technical difficulty concerns the group Grk, where “reducible” schemes must be 
identified (by deleting unnecessary stages or by combining stages that give identical 
results) to the same “irreducible” method (see Butcher (1972), or Butcher & Wanner 
(1996), p. 140). The definition of in Theorem 1.4 describes a group isomor¬ 
phism from Grk 1° Gew> further, Grw * s a subgroup of G^m and Theorem 1.10 
shows that formula (1.23) constitutes a group homomorphism from Gjjvi to Ggg. 
Because the elementary differentials are independent (see, e.g., Hairer, Nprsett & 
Wanner (1993), Exercise 4 of Sect. II.2), the last two groups are isomorphic. The 
group Grk can also he extended by allowing “continuous” Runge-Kutta schemes 
with “infinitely many stages” (see Butcher (1972), or Butcher & Wanner (1996), 
p. 141). The term “Butcher group” was introduced by Hairer & Wanner (1974). 

This paper tells the story of a mathematical object that was created by 
John Butcher in 1972 and was rediscovered by Alain Connes, Henri 
Moscovici and Dirk Kreimer in 1998. (Ch. Brouder 2004) 


Connection with Hopf Algebras and Quantum Field Theory. A surprising con¬ 
nection between Runge-Kutta theory and renormalization in quantum field theory 
has been discovered by Brouder (2000). One denotes by a Hopf algebra a graded 
algebra which, besides the usual product, also possesses a coproduct , a tool used by 
H. Hopf (1941) 2 in his topological classification of certain manifolds. Hopf algebras 
generated by families of rooted trees proved to be extremely useful for simplifying 
the intricate combinatorics of renormalization (Kreimer 1998). Kreimer’s Hopf al¬ 
gebra H is the space generated by linear combinations of families of rooted trees 
and the coproduct is a mapping A : 77 —> Ti 0 Ti which is, for the first trees, given 
by 

A( •) = • <g> 1 + 1 < 8 > • 


A(/) = / 01+ • <g> • +10 / 

A(\f) = 

A(}) = }®l+/®. + .®/ + l®} 


d-43) 


It can be clearly seen, that this algebraic structure is precisely the one underlying 
the composition law of Example 1.11, so that the Butcher group Gjm becomes the 
corresponding character group. The so-called antipodes of trees r G 77, denoted by 
S(r ), are for the first trees 


2 Not to be confused with E. Hopf, the discoverer of the “Hopf bifurcation”. 
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£(•) = - • 

s{/) = -/ + •• 

5(V) = — V + 2 / • — ••• 

S(}) = -} +2/ • - • • • 

and, apparently, describes the inverse element (1.42) in the Butcher group. 


(1.44) 


III.2 Order Conditions for Partitioned Runge-Kutta 
Methods 

We now apply the ideas of the previous section to the creation of the order conditions 
for partitioned Runge-Kutta methods (II.2.2) of Sect. II.2. These results can then 
also be applied to Nystrom methods. 

III.2.1 Bi-Coloured Trees and P-Series 

Let us consider a partitioned system 

V = f(y,z), z = g(y,z ) (2.1) 

(non-autonomous problems can be brought into this form by appending i = 1 ). 
We start by computing the derivatives of its exact solution, which are to be inserted 
into the Taylor series expansion. By analogy with (1.4) we obtain in this case the 
derivatives of y at to as follows: 

V = f 

y = fyf + fz9 (2.2) 

y {3) = fyyif , /) + 2 f yz (f , g ) + fzz(g, g) + f y f y f + f y f z g + f z g y f + f z g z g■ 

Here, f y ,f z , f yz , • • • denote partial derivatives and all terms are to be evaluated at 
(?/o ? ^o)- Similar expressions are obtained for the derivatives of z(t). 

The terms occurring in these expressions are again 
called the elementary differentials F(r)(y,z). For their 
graphical representation as a tree r, we distinguish be¬ 
tween “black” vertices for representing an / and “white” 
vertices for a g. Upwards pointing branches represent par¬ 
tial derivatives, with respect to y if the branch leads to a 
black vertex, and with respect to z if it leads to a white 
vertex. With this convention, the graph to the right corre¬ 
sponds to the expression f zy ( g yz (f , g), f) (see Table 2.1 for more examples). 

We denote by TP the set of graphs obtained by the above procedure, and we 
call them (rooted) bi-coloured trees. The first graphs are • and o. By analogy with 
Definition 1.1, we denote by 
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Table 2.1. Bi-coloured trees, elementary differentials, and coefficients 


M 

T 

graph 

a(r) 

F(r) 

7(t) 

0(r) 

ct(t) 

i 

• 

• 

1 

/ 

1 

Ei&i 

1 

2 

[•]v 

I 

1 

fyf 

2 

Sij bidij 

1 

2 

[°]v 

I 

1 

fzQ 

2 

bidij 

1 

3 

[•,•]» 

V 

1 

fyy(fJ) 

3 

^2 ijk bidijdik 

2 

3 

[•> 0 ]y 

Y 

2 

fyz(f,g ) 

3 

^2ijk bidijdik 

1 

3 

[°> °]y 

Y 

1 

fzz{g,g) 

3 

^2 ijk bidijdik 

2 

3 

[[•]v]y 


1 

fyfyf 

6 

^22 ijk bi a ij a jk 

1 

3 

[[°]y]» 

} 

1 

fyfzg 

6 

22 ijk bidijdjk 

1 

3 

[[•Wv 

\ 

1 

fz9vf 

6 

22 ijk biCLijdjk 

1 

3 

[[«]»]« 

} 

1 

fz9z9 

6 

22 ijk bidijdjk 

1 

1 

o 

o 

1 

9 

1 

J2ibi 

1 

2 

[•]. 

/ 

1 

9yf 

2 

22 ij bidij 

1 


etc 

etc 


etc 


etc 



[n, • • •, T m }y and [ti, ..., r m J z , n ,... ,r m e TP 

the bi-coloured trees obtained by connecting the roots of r %,..., r m to a new root, 
which is • in the first case, and o in the second. Furthermore, we denote by TP y 
and TP Z the subsets of TP which are formed by trees with black and white roots, 
respectively. Hence, the trees of TP y correspond to derivatives of y(t), whereas 
those of TP Z correspond to derivatives of z(t). 

As in Definition 1.2 we denote the number of vertices of r G TP by |r|, the 
order of r. The symmetry coefficient ct(t) is again defined by 

cr(.) = <j(o) = 1, 

and, for r = [ri,, r m \ y or r = [n,..,, T m ] z , by 

cr(r) = cr(n) •... • cr(r m ) ■ fi i!/i 2 ! • • •, (2.3) 

where the integers y 1 , H 2 , - - • count equal trees among tl, ..., r m G TP. This is 
formally the same definition as in Sect. III. 1. Observe, however, that <t(t) depends 
on the colouring of the vertices. For example, we have cr(\f) = 2, but cr( \P) = 1. 
By analogy with Definition 1.8 we have: 
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Definition 2.1 (P-Series). For a mapping a : TP U {ft y , ft z } —> M a series of the 
form 


P(a, (y,z)^j 


i(K)v+ Y — r^ a (r) F (T)(y, z ) 

tETP v ^ 

i{$z)z+ y 


is called a P-series. 

The following results correspond to Lemma 1.9 and formula (1.26). They are 
obtained in exactly the same manner as the corresponding results for non-partitioned 
Runge-Kutta methods (Sect. III. 1). We therefore omit their proofs. 

Lemma 2.2. Let a : TP U {0 y , ft z } —> M satisfy a{ft y ) = a(0*) = 1. Then 

where a' (0 y ) = a'( 0 z ) = 0 , a'(») = a'(o) = 1 , ana? 

a'(r) = a(ri) •... • a(r m ), (2.4) 

if either r =* [r 1; ..., T m ] s or r = [n, ..., r m ] z . □ 

Theorem 2.3 (P-Series of Exact Solution). ’/ //<' exact solution of (2.1) is a P-series 

(: y(to + h), z(to + h )) = P(e, (y 0 , z 0 )), w/tere e(0 y ) = e(0 z ) = 1 and 


e (r) = 


for all t G TP 


where the j(t) have the same values as for mono-coloured trees. 


III.2.2 Order Conditions for Partitioned Runge-Kutta Methods 

The next result corresponds to Theorem 1.4 and is a consequence of Lemma 2.2. 


Theorem 2.4 (P-Series of Numerical Solution). The numerical solution of a par¬ 
titioned Runge-Kutta method (II.2.2) is a P-series (t/i, zf) = P(fi, (yo, zo)), where 
fifty) = fiftz) = 1 and 


f Zt -1 biMr) 

for r G TP y 



4>{t) = \ _ s ~ 


(2.6) 

l Ei=i biM T ) 

for t G TP*. 



The expression ffir) is defined by 4>i{*) = fii( 

o) = 1 aft d by 



II 

? 

-g 

1 

II 

j J2j k = l a i jk fjk 

in) 

if r fc G TPy 

l J2j k = l a ijk ^jk 

in) 

if r k G TP z 




(2.7) 

for T = [-71, ..., T m \y or t = [t 1 ,... , r TO ] z . 








III.2 Order Conditions for Partitioned Runge-Kutta Methods 


69 


Proof. These formulas result from Lemma 2.2 by writing (hki, hlf) from the for¬ 
mulas (II.2.2) as a P-series (.hki , htf) = (yo, zq)) so that 

(h aijkj , h a,ij£j) = P(ipi,(y 0 ,z 0 )) 

j j 

is also a P-series. Observe that equation (2.6) corresponds to (1.16) (where g* has to 
be replaced with ff) and that formula (2.7) comprises (1.13) and (1.15), where we 
now write f>i instead of . □ 

The expressions <p(r) are shown in Table 2.1 for all trees in TP y up to order 
\t\ < 3. A similar table must be added for trees in TP Z , where all roots are white 
and all bi are replaced with bi. The general rule is the following: attach to every 
vertex a summation index. Then, the expression f(r) is a sum over all summation 
indices with the summand being a product of bi or bi (depending on whether the 
root “i” is black or white) and of ajk (if “fc” is black) or djk (if “fc” is white), for 
each vertex “fc” directly above “j”. 

Theorem 2.5 (Order Conditions). A partitioned Runge-Kutta method (II.2.2) has 
order r, i.e., yi — y(to + h) = 0(h r+1 ), z\ — z(to + h) = 0(h r+1 ), if and only if 

4>{t) = — forr E TP y U TP Z with \r\ < r. (2.8) 

7 (t) 

Proof. This corresponds to Theorem 1.5 and is seen by comparing the expansions 
of Theorems 2.4 and 2.3. □ 

Example 2.6. We see that not only does every individual Runge-Kutta method have 
to be of order r, but also the so-called coupling conditions between the coefficients 
of both methods must hold. The order conditions mentioned above (see formulas 

(II.2.3) and (II.2.5)) correspond to the trees ^ and ^. For the tree sketched 

below we obtain 

^ ^ ^i^ij^jrn^in^ik^kl^lq^lr^kp g ^ ^ ^ 

i,j,k,l,m,n,p,q,r 

or, by using a ij = c * and 12j <*ij = <k, 

'y v Cjo,fjCjan,-Cf-2 T0' 

i,j,k,l 

III.2.3 Order Conditions for Nystrom Methods 

A “modern” order theory for Nystrom methods (II.2.11) of Sect. II.2.3 was first 
given in 1976 by Hairer & Wanner (see Sect. 11.14 of Hairer, Nprsett & Wanner 
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1993). Later it turned out that these conditions are obtained easily by applying the 
theory of partitioned Runge-Kutta methods to the system 

V = z z = g(y, z), (2.9) 

which is of the form (2.1). This function has the partial derivative f z = I and all 
other derivatives of / are zero. As a consequence, many elementary differentials are 
zero and the corresponding order conditions can be omitted. The only trees remain¬ 
ing are those for which 

“black vertices have at most one son and this son must be white (2.10) 


Example 2.7. The tree sketched below apparently satisfies condition (2.10) and the 
corresponding order condition becomes, by Theorem 2.4 and formula (2.8), 


E 


hi 


1 i&ij CLjkQ j kmQ j knQ j kp ^jq^qr^rs^ji^it^tu^tv 


1 

13 - 12 - 4 - 3 - 2 - 4-3 ' 


Due to property (2.10), each a ^ inside the tree comes with a 
corresponding dkj , and by (2.10), both factors contract to an 
dij\ similarly, the black root is only connected to one white 
vertex, the corresponding bidij simplifies to bj . We thus get 

bjajkC k CkCLj q a qs aj t c t = — ^455 * 

j,k,q,s,t 



Each of the above order conditions for a tree in TP y has a “twin” in TP Z of one 
order lower with the root cut off. For the above example this twin becomes 

hjdjk c k C k a jq a qs a jt c t = 345 ^ ‘ 

j,k,q,s,t 

We need only consider the trees in TP Z if 

bi = bi( 1 - Ci) 

is satisfied (see Lemma 11.14.13 of Hairer, Nprsett & Wanner (1993), Sect. 11.14). 

Remark 2.8. Strictly speaking, the theory of partitioned methods is applicable to 
Nystrom methods only if the matrix (a^) is invertible. However, since we arrive at 
expansions with a finite number of algebraic conditions, we can recover the singular 
case by a continuous perturbation of the coefficients. 

Equations without Friction. Although condition (2.10) already eliminates many 
order conditions, Nystrom methods for the general problem y = g(y, y) cannot be 
much better than an excellent Runge-Kutta method applied pairwise to system (2.9). 
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There is, however, an important special case where much more progress is possible, 
namely equations of the type 

y = g(y), ( 2 . 11 ) 

which corresponds to motion without friction. In this case, the function for i in (2.9) 
is independent of z, and in addition to (2.10) we have a second condition, namely 

“white vertices have only black sons (2.12) 

Both conditions reduce the remaining trees drastically. Along each branch, there 
occur alternating black and white vertices. Ramifications only happen at white ver¬ 
tices. This case allows the construction of excellent numerical methods of high or¬ 
ders. For example, the following 13 trees 

o %> V { V is \ **¥ Q 

assure order 5, whereas ordinary Runge-Kutta theory requires 17 conditions for this 
order. See Hairer, Nprsett & Wanner (1993), pages 29If, for tables, examples and 
references. 



III.3 Order Conditions for Composition Methods 

We have seen in the preceding chapter that composition methods of arbitrarily high 
order can be obtained with the use of Theorem II.4.1. However, as demonstrated in 
Fig. II.4.4, these methods are not attractive for high orders. This section is devoted 
to the derivation of order conditions, which then allow the construction of optimal 
high order composition methods. 

The order conditions for these methods are often derived via the Baker-Campbell- 
Hausdorff formula. This will be the subject of Sect. III.5 below. Only very recently, 
Murua & Sanz-Serna (1999) have found an elegant theory based on the idea of B- 
series. This paper has largely inspired the subsequent presentation. 

III.3.1 Introduction 

The principal tool in this section is the Taylor series expansion 

&h(y) = y + hdi(y) + h 2 d 2 {y ) + h 3 d 3 (y) + ... (3.1) 

of the basic method. The only hypothesis which we require for this method is con¬ 
sistency, i.e., that 

di(y) = f(y). 

All other functions di (y) are arbitrary. 


(3.2) 
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The underlying idea for obtaining the expansions for composition methods is, in 
fact, very simple: we just insert the series (3.1), with varying values of h , into itself. 
All our experience from Sect. III. 1.2 with the insertion of a B-series into a function 
will certainly be helpful. We demonstrate this for the case of the composition = 
< &a 2 h ° Applied to an initial value yo, this gives with (3.1) 

2/1 = & ai h(yo) = 2 /o + haidi(y 0 ) + h 2 a\d 2 (y 0 ) + ■■■ 

2/2 = &a 2 h(yi) = 2/1 + ha 2 d 1 (y l ) + h 2 ald 2 (y 1 ) + ... . 


We now insert the first series into the second, in the same way as we did in (1.35). 
Then, for example, the term <^2 ( 7 / 1 ) becomes 


2/2 = • • • 


h 2 a%d 2 (y 0 ) + h 3 ala 1 d 2 (y 0 )d 1 (y 0 ) (3.4) 

h A 

h 4 alald2{y 0 )d 2 (y 0 ) + Y a t a i d 2(yo)(di(yo),di(y 0 )) + ■■■ 


We see that we arrive at “generalized” B-series, where the elementary differentials 
contain not only one function, but are composed of infinitely many functions and 
their derivatives. We symbolize the four terms written in (3.4) by the trees 



This leads us to the following definition. 


Definition 3.1 (oo-Trees, Boo -series). We extend Definitions 1.1 and 1.2 to Too, 
the set of all rooted trees where each vertex bears a positive integer without any 
further restriction, and use the notation 


®, = the trees with one vertex; 

[ti, ... ,r m ]i = the tree r formed by a new root © connected to t \,..., r m ; 

F(®)(y) = di(y); 

F ( r )(y) = d^\y){F(Ti)(y),...,F(T m )(y)) forr as above; 

\t\ = 1 + |ti | + ... + | r m |, the number of vertices of r; 

|\r 11 = i + | \r\ 11 + ... + | |r m 11, the sum of the labels of r; 


cr(r) = pi\p 2 \ • . . • • cr(n) • . . . • cr(r m ), 

where pi , p 2 , • • • count equal trees among t\ ,..., r m , 
the symmetry coefficient respecting the labels; 

i(r) = i, the of r. 

For a map a : T^ U {0} —> M we write 


Boo(a,y) 


= «(%+ a ( r ) F (r)(y) 


(3.5) 


which extends the notion of B-series to the new situation. 
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Example 3.2. For the tree 

(5)(6)(6) 






<=> t = [ ti , t 2 ] 4 where n = ®, r 2 = 


F(r)(y) = d^(y)(di(y),d^(y)(d 5 (y),d 6 (y),d 6 (y))) 
t= [©,[©, ©,©] 7 ] 4 , |r|=6, | |r| | = 29, <r(r) = 2, i(r) = 4 . 

The above calculations for (3.4) are governed by the following lemma. 
Lemma 3.3. For a series B^ia, y ) with a(0) = 1 we have 

/iii 7 


(3.6) 


h l d q 


(^B QQ ((2, ^ ^ 


reT 00 ,i(r)=i 


■ o-('r) 




(3.7) 


where a'(©) = 1 

a'(r) = a(n) • ... • a(r m ) /or r = [n,..., t to ]<. (3.8) 

Proof. This is a straightforward extension of Lemma 1.9 with exactly the same 
proof. □ 

The preceding lemma leads directly to the order conditions for composition 
methods. However, if we continue with compositions of the type (II.4.1), we arrive 
at conditions without real solutions. We therefore turn to compositions including the 
adjoint method as well. 


III.3.2 The General Case 

As in (II.4.6), we consider 

= @a s h ° @p 3 h ° • • • ° ^a 2 h ° @*p 2 h ° @<*ih ° hi 0 - 9 ) 

and we obtain with the help of the above lemma the corresponding B^ -series. 
Lemma 3.4 (Recurrence Relations). The following compositions are B ^-series 
{<t>* Pkh o...o$ aih o$; ih )(y) = B co (6fc,y) 

(&a k h°$f lkh °---°$ ai h° &fl lh )(v) = Boo(Ofc,y). 

Their coefficients are recursively given by a^(0) = 1, 6^(0) = &o(r) = 0 for all 

r E Too, 


&fc( r ) = a k -i(r) - (-/3 fe ) i(r) 
a fe (r) = 6 fc (r) + o;* fc (T) ^(r). 


(3.11) 
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Proof. The coefficients ao(r) correspond to the identity map 5oo(ao, y) = y. The 
second formula of (3.11) follows from 

Boo {pki y) = $a h h (^oo ipki y)^ = Booipki y) H" ^ ^ d{ (^E>oo ifki 2/)^ > 

i>l 

and from an application of Lemma 3.3. 

The relation 5^ (&&,?/) = 2/))> which involves the adjoint 

method, needs a little trick: we write it as 5 00 (a/ c _i, ?/) = ( &-p k h(B 00 (bk,y)) 
(remember that <P^ = ^1^), apply Lemma 3.3 again, and reverse the formula. This 
gives the first equation of (3.11). □ 

Adding the equations of (3.11), we get 

afe(r) = a fe _i(r) + {ap ) - (-/? fe )* (T) )6 , fe (r). (3.12) 

Because of b' k (Q)) = 1, we obtain 

k 

a k((D) =J2( a e~ 

£=1 

h((D) = J2 a e~ ^(-faY 

£=i e=i 


k , 0 . 13 ) 

= -(-&)*)• 


The fact that, for &&(©), the sum of (— fl#) 1 is from 1 to k, but the sum of a\ is only 
from 1 to k — 1, has been indicated by a prime attached to the summation symbol. 
Continuing to apply the formulas (3.11) and (3.12) to more and more complicated 
trees, we quickly understand the general rule for the coefficients of an arbitrary tree. 

Example 3.5. The tree r in (3.6) gives 



is(r) = - pi) ( ae +& 


E + ^m) E + /^n) (E ( a P “ 

m =1 n=l p=l 


The Order Conditions. The exact solution of y = f(y ) is a 5-series t/(£o + ft) = 
5(e, i/o) (see (1.26)). Since d\(y) = f(y ), every 5-series is also a 5^-series with 
e(r) =0 for trees with at least one label different from 1. Therefore, we also have 
y (to + h) = 5oo(e, t/o), where the coefficients e(r) satisfy e(©) = 1, e(r) = 0 if 
i(r) > 1, and 

e(r) = 2- e(ri) • ... • e(r m ) for r = [n, ..., r m ]i. 

I r l 


(3.15) 
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Theorem 3.6. The composition method &h{y) = Boo(a s ,y) of (3.9) has order p if 

a s (r) = e(r) for r G with ||r|| < p. (3.16) 

Proof This follows from a comparison of the -series for the numerical and the 
exact solution. For the necessity of (3.16), the independence of the elementary dif¬ 
ferentials has to be studied as in Exercise 3. □ 

III.3.3 Reduction of the Order Conditions 

The order conditions of the foregoing section are indeed beautiful, but for the mo¬ 
ment they are not of much use, because of the enormous number of trees in of 
a certain order. For example, there are 166 trees in T ^ with \ \r\\ < 6. Fortunately, 
the equations are not all independent, as we shall see now. 

Definition 3.7 (Butcher 1972, Murua & Sanz-Serna 1999). For two trees in T^, 

u = [tii,..., Um\i and ?;.= [v \ r ..., vi\j, we denote 

uov := [ui y ,..,u m ,v]i , uxv:= [m ,..., u m , v u *.., vi\ i+j (3.17) 
and call them the Butcher product and merging product , respectively (see Fig. 3.1). 




Fig. 3.1. The Butcher product and the merging product 


The merging product is associative and commutative, the Butcher product is 
neither of the two. To simplify the notation, we write products of several factors 
without parentheses, when we mean evaluation from left to right: 



u o Vi o V 2 O . . . ° V s = (ffu O Vi) o t? 2 ) o ...) o v s . (3.18) 


Here the factors v\ ,,.., v s can be freely permuted. 

All subsequent results concern properties of ak(r) as well as bk(r), valid for all 
k. To avoid writing all formulas twice, we replace ak(r) and bk(r) everywhere by 
a neutral symbol c(r). 
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Lemma 3.8 (Switching Lemma). All a^, bk of Lemma 3.4 satisfy, for all u^v E 
Too, the relation 


c(u o v) + c(v o u) = c(tx) • c(v) — c(u x u). 


(3.19) 


Proof The recursion formulas (3.11) are of the form 

a(r) = b(r) + b' (r). 


(3.20) 


We arrange this formula, for all five trees of Fig. 3.1, as follows: 

a(u o v) + a(v o u) + a(u x v) — a(u)a(v ) 
= b(u ov) + b(vou) + b(u x v) — b(u)b(v) 
+ a l ^b'(u o v) + a l ^b'(v o u) A a l ^ +l(<v>) b r (u x v) 

— a l ^b'(u)b(v) — a^b'(v)b(u) — <T( n )<T( v )£/(?/)£/(?;) . 


Because of b'(u o v) = b'(u)b(v) and b'(u x v) = b'(u)b'(v), the last two rows 
cancel, hence 


a(r) satisfies (3.19) <^> 6(r) satisfies (3.19). (3.21) 

Thus, beginning with ao, then 6i, then ai, etc., all and bk must satisfy (3.19). □ 

The Switching Lemma 3.8 reduces considerably the number of order conditions. 
Since the right-hand expression involves only trees with |r| < \u o v\, and since 
relation (3.19) is also satisfied by e(r), an induction argument shows that the order 
conditions (3.16) for the trees u ov and v o u are equivalent. The operation u o^h 
vou consists simply in switching the root from one vertex to the next. By repeating 
this argument, we see that we can freely move the root inside the graph, and of all 
these trees, only one needs to be retained. For order 6, for example, there remain 68 
conditions out of the original 166. 

Our next results show how relation (3.19) also generates a considerable amount 
of reductions of the order conditions. These ideas (for the special situation of sym- 
plectic methods) have already been exploited by Calvo & Hairer (1995b). 

Lemma 3.9. Assume that all bk of Lemma 3.4 satisfy a relation of the form 

N rrii 

Y J A i \[c{u ij )= 0 (3.22) 

i =1 3 = 1 

with all mi > 0- Then, for any tree w, all dk and bk satisfy the relation 

N 

Ai c(w O Uil o Ui 2 O . . . o Ui im . ) = 0. 

i=1 


(3.23) 
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Proof. The relation (3.20), written for the tree w o un o u& o ... o ., is 

a(w o un o ... o w iiTOi ) = b(w o un o ... o Ui, mi ) 

+ a^ w H'(w)b(un) ■ ... • 

Multiplying with and summing over i, this shows that, under the hypothesis 
(3.22) for b , the relation (3.23) holds for b if and only if it holds for a. The coef¬ 
ficients ao(r) = 0 for the identity map satisfy (3.22) and (3.23) because mi > 0. 
Starting from this, we again conclude (3.23) recursively for all ak and □ 

The following lemma 3 extends formula (3.19) to the case of several factors. 

Lemma 3.10. For any three trees u , u, w all a&, bk of Lemma 3.4 satisfy a relation 

c(u o v o w) + c(v o u o w) + c(u; ouov) = c(tx) • c(v) • c(rc) + ... , (3.24) 

where the dots indicate a linear combination of products flj c ( v j) with \v\ | + \v 2 \ + 
... < M + M + M an d> f or eac h term, at least one of the Vj possesses a label 
larger than one. The general formula, for m trees ui ,..., u m , is 

rn m 

yz c ( Ui ° Ul ° • • • ° 1 0 U *+1 ° • • • ° w m) = n c ^ u ^> + • • • • (3-25) 

i =1 i=l 

Proof. We apply Lemma 3.9 to (3.19) and obtain 

c(u; o(«o v)) + c(rc o (v o u)) = c(w ouov) — c(w o (u x v)). (3.26) 

Next, we apply the Switching Lemma 3.8 to the trees to the left and get 

c(w o (u o v)) + c(u ovow) = c(w) • c(u ov) — c(w X (wo v)) 
c(w o (v o u)) + c(v ouow) = c(w) • c(v o u) — c(w X (v o u)). 

Adding these formulas and subtracting (3.26) gives 

c(u o v o w) + c(v o u o w) + c(w OUOV ) = c(w) ( c(u o v) + c(v o u )) + . . . 

which becomes (3.24) after another use of the Switching Lemma. Thereby, every¬ 
thing which goes into “+ ..contains somewhere a merging product, whose roots 
introduce necessarily labels larger than one. 

Continuing like this, we get recursively (3.25) for all m. □ 

In order that the further simplifications do not turn into chaos, we fix, once and 
for all, a total order relation (written <) on X^, where we only require that the 
order respects the number of vertices, i.e., that 

u < v whenever \u\ < |u|. (3.27) 

Similar to the strategy introduced by Hall (1950) for simplifying bracket expressions 
in Lie algebras, we define the following subset of T^. 

3 due to A. Murua, private communication, Feb. 2001 
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Definition 3.11 (Hall Set). The Hall set corresponding to an order relation (3.27) 
is a subset H C T ^ defined by 

© G H for i = 1, 2, 3,... 

r G Ti there exist u,v G 7Y, u > v, such that r = u o v. 

Example 3.12. The trees in the subsequent table are ordered from left to right with 
respect to |r|, and from top to bottom within fixed \r\. There remain finally 22 
conditions for order 6. 


A Hall set Ti with | |r 11 < 6: 

© 

© 

© 

© 

© ® © 

© © © 

(T) (T)(T) 



Not in Ti are, for example: 

because u = v = © ; 

W Q 

q because u = q is not in 



CDYCD 


because u = © < v = 


i) because u = (f) is not in TL\ 

£ 


because u = v = 


Theorem 3.13 (Murua & Sanz-Serna 1999). For each r G T ^ there are constants 
Ai, integers rrii and trees Uij G Ti such that for all a&, bk of Lemma 3.4 we have 

N mi 

c ( r ) = H c ( u ij )> e |«ii| + ... + < |r|. (3.28) 

z=i i=i 

Proof We proceed by induction on |r|. For r = © the statement is trivial, because 
© G Ti. We thus consider r G X© with |r| > 2, write it as r = u o u, and conclude 
through the following two steps. 

First Step. We apply the induction hypothesis (3.28) to v, i.e., 

c(v) = Yl Bi Vij&n, (3-29) 

i 3 

To this, we apply Lemma 3.9 followed by the Switching Lemma 3.8: 

c(r) = c(u O v) = ^7 B i c ( u °Vii°Vi 2 ...o Vi tHi ) 
i 

= -^2 Bi c(v ini o(uovno...o j)) + ... . 



III.3 Order Conditions for Composition Methods 


79 


The indicate terms containing trees to which we can apply our induction 

hypothesis. Inside the above expressions, we apply the induction hypothesis to the 
trees i 4 o^io > ..o^ }n ._i, followed once again by Lemma 3.9. We arrive at a huge 
double sum which constitutes a linear combination of expressions of the form 

c(ui 0 « 2 0...0 Um) (3.30) 

and of terms “+ ...” covered by the induction hypothesis. The point of the above 
dodges was to make sure that all u\, U 2 , ..., u m are in Ti. 

Second Step. It remains to reduce an expression (3.30) to the form required by 
(3.28). The trees U 2 , • • •, u m can be permuted arbitrarily; we arrange them in in¬ 
creasing order ti 2 < ... < u m . 

Case 1. If u\ > U 2 , then by definition u\ o = w G Ti and we absorb the 
second factor into the first and obtain a product w o o ... o u m with fewer factors. 

Case 2. If u\ < u<i < ..we shuffle the factors with the help of Lemma 3.10 
and obtain for (3.30) the expression 

m m 

- ^2 c(Ui 0«iO...) + n c ( u i ) + • • • • 

i=2 i= 1 

With the first terms we return to Case 1, the second term is precisely as in (3.28), 
and the terms are covered by the induction hypothesis. 

Case 3. Now let u\ = < ... .In this case, the formula (3.25) of Lemma 3.10 

contains the term (3.30) twice. We group both together, so that (3.30) becomes 

m m 

~2 'S2 C ( Ui ° Ul OM i ° •• •) + g II c ( u i) + ■■■ 

i =3 i —1 

and we go back to Case 1. If the first three trees are equal, we group three equal 
terms together and so on. 

The whole reduction process is repeated until all Butcher products have disap¬ 
peared. □ 

Theorem 3.14 (Murua & Sanz-Serna 1999). The composition method \Th{y) = 
Boo(as,y) of (3.9) has order p if and only if 

a s (r) = e(r) for r G Ti with ||r|| < p. 

The coefficients e(r) are those of Theorem 3.6. 

Proof. We have seen in Sect. II.4 that composition methods of arbitrarily high order 
exist. Since the coefficients Ai of (3.28) do not depend on the mapping c(r), this 
together with Theorem 3.6 implies that the relation (3.28) is also satisfied by the 
mapping e for the exact solution. This proves the statement. □ 
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Example 3.15. The order conditions for orders p = 1,..., 4 become, with the trees 
of Example 3.12 and the rule of (3.14), as follows: 


Order 1: 


Order 2: 


Order 3: 


Order 4: 


® + f) k ) — 1 

k=X 

® ]T(a|-/^) = 0 

k =1 
s 

® J2( a l+Pk) = o 

k=l 

SL ^2( a l - 0i) ^2 m ° 

k =1 1=1 

® - Pi) = ° 

k=l 

SL y^x a i +( a ^+ a)= o 

fc=i ^=i 

y^X a t 0i) (Y, ( a * + ffi)) = 

fc=t \=1 


(3.31) 


where, as above, a pnme attached to a summation symbol indicates that the sum of 
a\ is only from 1 to k — 1, whereas the sum of (— f3i) x is from 1 to k. Similarly, the 
remaining trees of Example 3.12 with ||r|| = 5 and ||r|| = 6 give the additional 
conditions for order 5 and 6. 

We shall see in Sect. V.3 how further reductions and numerical values are ob¬ 
tained under various assumptions of symmetry. 


III.3.4 Order Conditions for Splitting Methods 


Splitting methods, introduced in Sect. II.5, are based on differential equations of the 
form 

v = fi(v) + f 2 (v), (3.32) 

where the flows 22 an d 22 °f the systems y = fi (y) and y = f 2 (y) are assumed 
to be known exactly. In this situation, the method 

$h = <p l h ° <p l h 


is of first order and, together with its adjoint = (p [ h l o (p [ h \ can be used as the 
basic method in the composition (3.9). This yields 


= Va! +1 h 


[ 2 ] 

0 n: h 


° <Palh 


[ 2 ] 

0 • • • 0 Vbih 


[1] [2] [1] 

° <Palh ° <Pblh ° <Palh 


(3.33) 


where 
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bi — oti + di — oti- 1 + fa (3.34) 

with the conventions ao = 0 and P s +i = 0. Consequently, the splitting method 
(3.33) is a special case of (3.9) and we have the following obvious result. 

Theorem 3.16. Suppose that the composition method (3.9) is of order p for all 
basic methods 4>h, then the splitting method (3.33) with bi given by (3.34) is of 
the same order p. □ 

We now want to establish the reciprocal result. To every consistent splitting 
method (3.33), i.e., with coefficients satisfying • a* = JT bi = 1, there exist 
unique a^, Pi such that (3.34) holds. Does the corresponding composition method 
have the same order? 

Theorem 3.17. If a consistent splitting method (3.33) is of order p at least for 
problems of the form (3.32) with the integrable splitting 

/‘M 91 o 2> )- *<*> = («<«>) »-(£)• <3J5) 

then the corresponding composition method has the same order p for an arbitrary 
basic method 

Proof. McLachlan (1995) proves this result in the setting of Lie algebras. We give 
here a proof using the tools of this section. 

a) The flows corresponding to the two vector fields /i and of (3.35) are 
pf\y) =1/3- tfi(y) and pf\y) = y + tf 2 (y), respectively. Consequently, the 
method = <p^ o can be written in the form (3.1) with 

di(y) = h(y) + hiv), 4+i(y) = ^/i (fe) (y)(/ 2 (y ),---,/2 (?/))• (3.36) 

The idea is to construct, for every tree r G H, functions g\ (t/ 2 ) and g 2 (yf) such that 
the first component of F(r)( 0) is non-zero whereas the first component of F(a)(0) 
vanishes for all a £ T ^ different from r. This construction will be explained in 
part (b) below. Since the local error of the composition method is a -series with 
coefficients a s (r) — e(r), this implies that the order conditions for r £ TL with 
|| r || < p are necessary already for this very special class of problems. Theorem 3.14 
thus proves the statement. 

b) For the construction of the functions g\ (yf) and g 2 (yf) we have to understand 
the structure of F(r)(y) with dk(y) given by (3.36). Consider for example the tree 
r £ of Fig. 3.2, for which we have F(r)(y) = d!{(y) (d\(y), d$(y)). Inserting 
dk (y) from (3.36), we get by Leibniz’ rule a linear combination of eight expressions 
(z e {1,2}) 


/r(/2,/ l ,/i , (/2,/ 2 )) 

/"(/i,/2/"(/2,/ 2 )), 


f'xf'-i (/i>/"(/ 2 ,/ 2 )), 
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Fig. 3.2. Trees for illustrating the equivalence of the order conditions between composition 
and splitting methods 


each of which can be identified with a bi-coloured tree (see Sect. III.2.1, a vertex 

• corresponds to /i and o to ff). The trees corresponding to these expressions 
with i = 1 are shown in Fig. 3.2. Due to the special form of dk(y) in (3.36) and 
due to the fact that in trees of the Hall set Ti the vertex © can appear only at the 
end of a branch, there is always at least one bi-coloured tree where the vertices • 
are separated by those of o and vice versa. We now select such a tree, denoted by 
7b, and we label the black and white vertices with {1,2,...}. We then let y\ = 
(y {,..., ?/* ) T an d V 2 = (y \, • • •, Vm) T ’ where n and m are the numbers of vertices 

• and o in r&, respectively. Inspired by “Exercise 4” of Hairer, Nprsett & Wanner 

(1993), page 155, we define the it h component of < 71 ( 3 / 2 ) as the product of all y 2 
where j runs through the labels of the vertices directly above the vertex • with 
label i. The function < 72 ( 2 / 1 ) is defined similarly. For the example of Fig. 3.2, the tree 
U yields / 2 \ 

51(2/2) = I 2/22/3 I > 52(2/1) = 

One can check that with this construction the bi-coloured tree 75 is the only one 
for which the first component of the elementary differential evaluated at y = 0 is 
different from zero. This in turn implies that among all trees of T ^ only the tree r 
has a non-vanishing first component in its elementary differential. □ 

Necessity of Negative Steps for Higher Order. One notices that all the compo¬ 
sition methods (II.4.6) of oder higher than two with $h given by (II.5.7) lead to a 
splitting (II.5. 6 ) where at least one of the coefficients a* and bi is negative. This 
may be undesirable, especially when the flow J originates from a partial differen¬ 
tial equation that is ill-posed for negative time progression. The following result has 
been proved independently by Sheng (1989) and Suzuki (1991) (see also Goldman 
& Kaper (1996)). We present the elegant proof found by Blanes & Casas (2005). 

Theorem 3.18. If the splitting method (II. 5.6) is of order p > 3 for general f W and 
f^ 2 \ then at least one of the ai and at least one of the bi are strictly negative. 

Proof. The condition in equation (3.31) for the tree © reads 

s s+1 

52( a fe+^) = 0 oralso J2( a l-i+t 3 k) = ° 

k= 1 k= 1 

(remember that cto = 0 and 1 = 0). Now apply the fact that x 3 + y 3 <0 implies 
x + y < 0 and conclude with formulas (3.34). □ 
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III.4 The Baker-Campbell-Hausdorff Formula 

This section treats the Baker-Campbell-Hausdorff (short BCH or CBH) formula on 
the composition of exponentials. It was proposed in 1898 by J.E. Campbell and 
proved independently by Baker (1905) and Hausdorff (1906). This formula will 
provide an alternative approach to the order conditions of composition (Sect. II.4) 
and splitting methods (Sect. II.5). For its derivation we shall use the inverse of the 
derivative of the exponential function. 

III.4.1 Derivative of the Exponential and Its Inverse 

Elegant formulas for the derivative of exp and for its inverse can be obtained by 
the use of matrix commutators [i?, A] = QA — AQ. If we suppose Q fixed, this 
expression defines a linear operator A [Q, A] 

ad n (A)*x[[f2,A], (4.1) 

which is called the adjoint operator (see Varadarajan (1974), Sect. 2.13). Let us start 
by computing the derivatives of Q k . The product rule for differentiation becomes 

(4- n k )H = Hn*- 1 + f2H(2 k ~ 2 + ... + n^H, (4.2) 
\aU J 

and this equals kHQ k ~ x if Q and H commute. Therefore, it is natural to write 
(4.2) as kHQ k ~ x to which are added correction terms involving commutators and 
iterated commutators. In the cases k = 2 and k = 3 we have 

HQ + QH = 2HQ + ad n{H) 

HQ 2 + QHQ + Q 2 H = 3 HQ 2 + 3 (ad n (H))f2 + ad 1(H), 

where ad^ denotes the iterated application of the linear operator ad q. With the 
convention ad ^ (H) = H we obtain by induction on k that 

(4.3» 

i=0 ^ ' 

This is seen by applying Leibniz’ rule to Q k+1 = Q • Q k and by using the identity 

0(ad' f; (//))r„ (ad',,(//)) 17 • ad','(//). 

Lemma 4.1. The derivative of exp Q = ^2 k>0 ^ Q k is given by 
(jn ex V^) H = ( dex Pn( H )) exp/2, 

where 

dex Pr2 (ff) = y] 7 -^-ad^(ff). (4.4) 

k>0 ^ ^ 

The series (4.4) converges for all matrices Q. 
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Proof. Multiplying (4.3) by (k!) 1 and summing, then exchanging the sums and 
putting j m k — i — 1 yields 

(ss - n ) H - ^hX{i L + i)(^K'-' 

k >0 i =0 v 7 

= E E (Tnjiji 

i> 0 J>0 v J 

The convergence of the series follows from the boundedness of the linear operator 
ad q (wehave ||ad^|| < 2||i?||). □ 

Lemma 4.2 (Baker 1905). If the eigenvalues of the linear operator ad q are differ¬ 
ent from 2£n i with I G {±1, ±2,.. then dexjp n ™ invertible. Furthermore, we 
have for ||f2|| < i r that 

dexp^\H) = Y, ff adS»(ff), (4-5) 

k> 0 

where Bk are the Bernoulli numbers, defined by oi^k/k\)x k = x/ ( e x — 1). 

Proof. The eigenvalues of dexp n are p = + !)• = ( eA — 1)/A, 

where A is an eigenvalue of ad q. By our assumption, the values // are non-zero, so 
that d exp^ is invertible. By definition of the Bernoulli numbers, the composition of 
(4.5) with (4.4) gives the identity. Convergence for || Q\\ < tt follows from ||ad q || < 
21| Q || and from the fact that the radius of convergence of the series for x/(e x — 1) 
is 27 t. □ 


III.4.2 The BCH Formula 

Let A and B be two arbitrary (in general non-commuting) matrices. The problem is 
to find a matrix C(t ), such that 

exp(L4) exp(TB) = exp C(t). (4.6) 

In order to get a first idea of the form of C(t ), we develop the expression to the left in 
a series: exp (tA) exp (tB) = /4-£(d+T?) + y (A 2 +2AB+B 2 )+0(t 3 ) =: I+X. 
For sufficiently small t (hence ||X|| is small), the series expansion of the logarithm 
log(/ + X) = X — X 2 /2 + ... yields a matrix C(t) = log(/ + X) = t(A + B) + 
Y ( A 2 + 2AB + B 2 — (A + B) 2 ) + 0(t 3 ), which satisfies (4.6). This series has 
a positive radius of convergence, because it is obtained by elementary operations of 
convergent series. 

The main problem of the derivation of the BCH formula is to get explicit for¬ 
mulas for the coefficients of the series for C(t ), and to express the coefficients of 
t 2 , t 3 ,... in terms of commutators. With the help of the following lemma, recurrence 
relations for these coefficients will be obtained, which allow for an easy computa¬ 
tion of the first terms. 
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John Edward Campbell 4 Henry Frederick Baker 5 Felix Hausdorff 6 


Lemma 4.3. Let A and B be (non-commuting) matrices. Then, (4.6) holds, where 
C(t ) is the solution of the differential equation 

C = A + B + ^[A-B,C} + J2^r^ k c(A + B) (4.7) 

k >2 K ' 

with initial value (7(0) = 0. Recall that ad cA = [(7, A] = CA — AC, and that B & 
denote the Bernoulli numbers as in Lemma 4.2. 


Proof. We follow Varadarajan (1974), Sect. 2.15, and we consider for small 5 and t 
a smooth matrix function Z(s,t) such that 

exp(sA) exp (tB) = exp Z(s, t). (4.8) 

Using Lemma 4.1, the derivative of (4.8) with respect to 8 is 

A exp(sA) exp(tB) = dexp z{st) [ff{s,t) s j exp Z(s,t), 

so that 

d -f- = dexp z 1 (A) = A- 1 [Z,A] + ^2 *Lad k z (A). (4.9) 

k >2 

We next take the inverse of (4.8) 

4 John Edward Campbell, bom: 27 May 1862 in Lisburn, Co Antrim (Ireland), died: 1 Oc¬ 
tober 1924 in Oxford (England). 

5 Henry Frederick Baker, bom: 3 July 1866 in Cambridge (England), died: 17 March 1956 
in Cambridge. 

6 Felix Hausdorff, born: 8 November 1869 in Breslau, Silesia (now Wroclaw, Poland), died: 
26 January 1942 in Bonn (Germany). 
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exp(— tB) exp(— sA) = exp(— Z(s, t )), 

and differentiate this relation with respect to t. As above we get 

by 1 B 

— = dexplps) = B + - [Z, B] + Y, ad|(B), (4.10) 

k> 2 

because ad^ z (5) = (—l) fc ad k z (B) and the Bernoulli numbers satisfy Bk = 0 
for odd k > 2. A comparison of (4.6) with (4.8) gives C(t ) = Z(t, t). The stated 
differential equation for C{t) therefore follows from C(t) = ^(t, t) + ^ (£,£), 
and from adding the relations (4.9) and (4.10). □ 

Using Lemma 4.3 we can compute the first Taylor coefficients of C(t) 9 

exp (tA) exp (tB) = exp^tCi + t 2 C 2 + t 3 C 3 + t 4 Ci + t 5 Cs + .. (4.11) 

Inserting this expansion of C(t ) into (4.7) and comparing like powers of t gives 

Ci = A + B 

C 2 = \[A-B,A + B]= 1 -[A,B) 

C 3 = \[a-B,\ [A, B]\ = T [a, [A, B}] + T [ B) [£, A]" 

Ca = ... = T[a,[B,[B,A]]] (4.12) 

C 5 = ... = -T[4,[4,[4,[4,B]]]] -^[b,[B,[B,[B,A\]] 

+ W0 K ^ t 5 ’ [B ’+ 3^0 [ A > B l]]' 

+ i^oh [ A > • 

Here, the dots ... in the formulas for C 4 and C 5 indicate simplifications with the 
help of the Jacobi identity 

[A, [B, C\] + [C, [A, B]] + [B, [C, A]] = 0, (4.13) 

which is verified by straightforward calculation. For higher order the expressions 
soon become very complicated. 

The Symmetric BCH Formula. For the construction of symmetric splitting meth¬ 
ods it is convenient to use a formula for the composition 

expQ^ exp (tB) exp = exp^tSi + t 3 Ss + t 5 S$ + .. . (4.14) 

Since the inverse of the left-hand side is obtained by changing the sign of t, the 
same must be true for the right-hand side. This explains why only odd powers of 
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t are present in (4.14). Applying the BCH formula (4.11) to exp(|A) exp(|T>) = 
exp C(t) and a second time to exp (C(t)) exp (-C(-t)) yields for the coefficients 
of (4.14) (Yoshida 1990) 


S i = A + B 


^ = ~h\ 

A, [A,B]]+±[B,[B,A] 


55 = 5760 [ 

A, [A, [A, [.A,B ]]]' 

1 ' 

720 . 

B, [B, [B, [B,A ]]]' 

+ 360 

A, [B, [B, [B,A]]] 

. + 360 

B, [A, [A, [.A,B ]]]' 

1 

_ 480 

A, [A, [B,[B,A]]] 

. + 120 

B, [B, [A, [ A,B ■]]]' 


III.5 Order Conditions via the BCH Formula 

Using the BCH formula we present an alternative approach to the order conditions 
of splitting and composition methods. The main idea is to write the flow of a differ¬ 
ential equation formally as the exponential of the Lie derivative. 


III.5.1 Calculus of Lie Derivatives 


For a differential equation 

y = + f [2] (y), 

it is convenient to study the composition of the 
flows and p^ of the systems 

y = f [1] {y), y = f [2] {y), (5.i) 


respectively. We introduce the differential op¬ 
erators (Lie derivative) 


Di = Y,ff{v) 

3 


d 

dVj 



which means that for differentiable functions 
F : M n —> M m we have 


Wolfgang Grobner 7 


D i F(y) = F\y)f^(y). 


(5.2) 


It follows from the chain rule that, for the solutions p\yo) of (5.1), 

7 Wolfgang Grobner, bom: 11 February 1899 in Gossensass, South Tyrol (now Italy), died: 
10 August 1980 in Innsbruck. 
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j t F(yf{y „)) = (DiF) (<Pt\yo)), (5.3) 

and applying this operator iteratively we get 

^F(^\yo)) = (D?F){<p®(y 0 )). (5.4) 

Consequently, the Taylor series of (yfj), developed at t = 0, becomes 

F(<P l ?(vo)) = E M (^^(yo) = exp(iA)F( 2 /o). (5.5) 

Now, putting F(t/); = Id(t/) = y , the identity map, this is the Taylor series of the 
solution itself 

j-k 

2t\yo) = ^2-n ( D i^)(yo) = exp(tDi)Id(y 0 ). (5.6) 

k >0 

If the functions f^\y) are not analytic, but only 7V-times continuously differen¬ 
tiable, the series (5.6) has to be truncated and a 0(h N ) remainder term has to be 
included. 

Lemma 5.1 (Grobner 1960). Let and be the flows of the differential equa¬ 
tions y = /M (y) and y = (y), respectively. For their composition we then have 

(d 21 ° ^i 11 )^) = exp(sDi) exp(tL> 2 )Id( 2 /o). 

Proof This is precisely formula (5.5) with i = l,t replaced with 8, and with F(y) = 
2t\v) = exp(W 2 )Id(y 0 ). □ 

Remark 5.2. Notice that the indices 1 and 2 as well as s and t to the left and right 
in the identity of Lemma 5.1 are permuted. Grobner calls this phenomenon, which 
sometimes leads to some confusion in the literature, the “Vertauschungssatz”. 

Remark 5.3. The statement of Lemma 5.1 can be extended to more than two flows. 
If is the flow of a differential equation y = (y), then we have 

° • • • ° 22 ° ^i 11 ) (Vo) = exp(sDi) exp(tD 2 ) ■ exp(uD m )ld(y 0 ). 

This follows by induction on m. 

In general, the two operators D\ and D 2 do not commute, so that the composi¬ 
tion exp(tDi) exp(tD 2 )Id(yo) is different from exp(t(Di + D 2 ))ld(yo) , which 
represents the solution ip t (y o) of y = f(y) = f^\y) + f^(y)- The relation of 
Lemma 5.1 suggests the use of the BCH formula. However, Di and D 2 are un¬ 
bounded differential operators so that the series expansions that appear cannot be 
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expected to converge. A formal application of the BCH formula with tA and tB 
replaced with sDi and tD 2 , respectively, yields 

exp(sDi) exp(tD 2 ) = exp(Z9(s,t)), (5.7) 

where the differential operator D(s,t) is obtained from (4.11) as 

D(s,t) = sD 1 +tD 2 +^[D u D 2 } + Q\D 1 ,[D 1 ,D 2 } 

2 2 2 L J (5.8) 

+ s -L[d 2 ,[d 2 ,d 1 ]\ + s -^-[d 1 ,[d 2 ,[d 2 ,d 1 ]]]+... . 

The Lie bracket for differential operators is calculated exactly as for matrices, 
namely, [Di,D 2 ] = B\D 2 — D 2 D\. But how can we interpret (5.7) rigorously? 
Expanding both sides in Taylor series we see that 

exp(sDi) exp(tD 2 ) = I+sDi +tD 2 + ^s 2 Dl + 2stD 1 D 2 -\-t 2 D^j +... (5.9) 

and 

exp(D(s,t)) = 7 + D(s, t) + ^D(s, t) 2 + ... 

= 7 + <si9i + tD 2 + — ^(<si9i + tD 2 ) 2 + 792]^ + ... . 

By derivation of the BCH formula we have a formal identity, i.e., both series have 
exactly the same coefficients. Moreover, every finite truncation of the series can be 
applied without any difficulties to sufficiently differentiable functions F(y). Con¬ 
sequently, for A/'-times differentiable functions the relation (5.7) holds true, if both 
sides are replaced by their truncated Taylor series and if a 0(h N ) remainder is added 
(h = max(|s|, \t\)). 


III.5.2 Lie Brackets and Commutativity 

If we apply D 2 to a function F, followed by an application of Tfi, we will obtain 
partial derivatives of F of first and second orders. However, if we subtract from this 
the same expression with D\ and D 2 reversed, the second derivatives will cancel 
(this was already remarked upon by Jacobi (1862), p. 39: “differentials partialia 
secunda functionis / non continere”) and we see that the Lie bracket 


[D u D 2 \ = D\D 2 - D 2 D x = 


/ iJ - 


( 5 - 10 ) 

oyi 


is again a linear differential operator. So, from two vector fields /W and /I 2 i we 
obtain a third vector field /I 3 1. 
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The geometric meaning of the new vector 
field can be deduced from Lemma 5.1. We see 
by subtracting (5.9) from itself, once as it stands 
and once with sDi and tD 2 permuted, that Vo 

( pf ] ° l P [ s ] (yo)- i P [ s ]oi pf ] (yo) = st[D 1 ,D 2 ]ld(y 0 )+... = st f [3] (yo)+- ■. (5.11) 

(see the picture), where “+ ...” are terms of order > 3. This leads us to the following 
result. 

Lemma 5.4. Let f M (y) and f ^ (y) be defined on an open set. The corresponding 
flows p^ and pf^ commute everywhere for all sufficiently small s and t, if and only 

if 

[D 1 ,D 2 \ = 0. (5.12) 

Proof The “only if” part is clear from (5.11). For proving the “if” part, we take s 
and t fixed, and subdivide, for a given n, the integration intervals into n equidistant 
parts As = s/n and At = t/n. This allows us to transform the solution p^ o 
yfs\yf) by a discrete homotopy in n 2 steps into the solution ° T>¥\yo), eac h 
time appending a small rectangle of size 0(n~ 2 ). If we denote such an intermediate 
stage by 

r [2] [1] [2] [1] / X 

r k = • • • ° ° <Pi 2 A8 ° VjUt ° nUsw) 

then we have i~o = o <p^ (y Q ) and r n 2 = o pf^ (y Q ) (see Fig. 5.1). Now, for 
n —► 00 , we have the estimate 

/'/.••I- /'/,| < C9(/; 3 ). 

because the error terms in (5.11) are of order 3 at least, and because of the dif¬ 
ferentiability of the solutions with respect to initial values. Thus, by the triangle 
inequality \r n 2 — i~o| < 0(n -1 ) and the result is proved. □ 



Fig. 5.1. Estimation of commuting solutions 
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III.5.3 Splitting Methods 


We follow the approach of Yoshida (1990) for obtaining the order conditions of 
splitting methods (II.5.6). The idea is the following: with the use of Lemma 5.1 we 
write the method as a product of exponentials, then we apply formally the Baker- 
Campbell-Hausdorff formula to get one exponential of a series in powers of h. Fi¬ 
nally, we compare this series with h(D i + D 2 ), which corresponds to the exact 
solution of (5.1). 

The splitting method (II.5.6), viz., 


**=a 


[i] 

o czr u 

^ a rn h 


[ 2 ] [ 1 ] [ 2 ] [ 1 ] 
o (n u , o . . . o m L \ o \ o (o L J , , 
r b rn -\h r a 2 n r tj 1 h, r a\h^ 


(5.13) 


is a composition of expressions cp^} h o (f^. h which, by Lemma 5.1 and by (5.7), can 
be written as an exponential 


[ 2 ] [ 1 ] 

Vb h ° <Pa h = eX P 


• bj hE\ + ctj bj h 2 El 


( a jhE{ 

+a 2 j b j h 3 Ef + atftfEl + + .. .)ld, 


(5.14) 


where we use the abbreviations 

E\ = D 1 , E\ = D 2 , E%**\[D U D 2 \, Ef = -L [D u [D u D 2 }}, 

El = F [d 2 , [d 2 ,d 1 }] , [£>, [D 2 , [D 2 , D 1 ]}], 

and the dots indicate 0(h 5 ) expressions. 

We next define recursively by 

^ (0) = Id, = (pf-h o <p [ £ h o (5.15) 

so that is equal to our method (5.13). Aiming to write l P l]> also as an exponen- 
tial of differential operators, we are confronted with computing commutators of the 
expressions Ej. We see that [E\, E\] = 2 El, [E\,El] = 6 Ef, [E\, El] = — 6 E 2 , 
[El El] = 2 El and [Ela\] = -2 E\ as a consequence of the Jacobi identity 
(4.13). But the other commutators cannot be expressed in terms of E\ . We therefore 
introduce 

E* = ±[D 2 ,[D 2 ,[D 2 ,D 1 ]]]. 

This allows us to formulate the following result. 

Lemma 5.5. The method defined by (5.15), can be formally written as 

= exp (c l hj hE\ + cl^hEl + c\^E\ + c\^E\ 

+ c 2,jh 3 E 2 + cfjh 4 Ef + 4 d h A E$ + 4 d h 4 El + .. ,)ld, 
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where all coefficients are zero for j = 0, and where for j > 1 


c 


l 

i J 


c 


2 

1 J 


C 


3 

1,3 


C 


3 

2 J 


- A 


= C 




- 




" 1 , 3-1 


- A 


= c: 


2,J —1 


+ a V C 2,j = c 2,j-l+bj, 

+ ajbj + c\ j_ 1 bj - clj_ 1 aj, 

+ Ojbj + 2c\ }j _ 1 a j b j - ;5rj , 

+ ( c i,i-i) 2& i - + C 2,j-l a p 

+ ajb] - 4cl j _ 1 a j b j + 3c 2 ltj foj 
+ ( c 2,i-i) 2 «i - c \,j-l C l,j-l h j + c \,j-i h % 


and similar but more complicated formulas for c\-. 

Proof Due to the reversed order in Lemma 5.1 we have to compute exp (A) exp( B ), 
where A is the argument of the exponential for and B is that of (5.14). The 

rest is a tedious but straightforward application of the BCH formula. One has to use 
repeatedly the formulas for [E \, E l k ], stated before Lemma 5.5. □ 


Theorem 5.6. The splitting method (5.13) is of order p if 

c \,rn = c 2,m = 1 , c e,rn = 0 for k = 2,p and all t. (5.16) 


The coefficients df m are those defined in Lemma 5.5. 

Proof. This is an immediate consequence of Lemma 5.5, because the conditions of 
order p imply that the Taylor series expansion of coincides with that of 

the solution ph(y o) = exp (h(D% + D 2 ])yo up to terms of size 0(h p ). □ 

A simplification in the order conditions arises for symmetric methods (5.13), 
that is, for coefficients satisfying a m+ i_^ = and b m -i — b{ for all i (and b m = 0). 
By Theorem II.3.2, it is sufficient to consider the order conditions (5.16) for odd k 
only. 


III.5.4 Composition Methods 

We now consider composition methods (II.4.6), viz., 

= @a s h ° @p s h ° • • • ° @*p 2 h ° ^ai h ° (5.17) 

where <T>h is a first-order method for y = f(y) and <T>* h is its adjoint. We assume 

dlh = exp C\ -\- h^C 2 T~ • • .^Id (5.18) 

with differential operators Ci, and such that C\ is the Lie derivative operator cor¬ 
responding to y = f(y). For the splitting method <T>h = ipff o this follows 
from (5.14), and for general one-step methods this is a consequence of Sect. IX. 1 on 
backward error analysis. The adjoint method then satisfies 
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4>* h = exp (hCi - h 2 C 2 + h 3 C 3 - .. .)ld. (5.19) 

From now on the procedure is similar to that of Sect. III.5.3. We define recur¬ 

sively by 

^ ( 0 ) =Id, =$ ajh o$} jh o$U- 1 \ (5.20) 

so that becomes (5.17). We apply the BCH formula to obtain 

$ ajh o $*p jh = exp (^3jhCi - (3 2 h 2 C 2 + ...) exp [ajhCi + a 2 h 2 C 2 + .. .)ld 
= exp [{aj + flj)hE\ + {a 2 - (5 2 )h 2 E\ 

+ (a 3 + p 3 )h 3 E 3 + \ajf3jiaj + Pj)h 3 E 3 + ... )ld 

where 

E\ = C k , E 3 = [C 1 ,C 2 ]. 

We then have the following result. 

Lemma 5.7. The method of (5.20) can be formally written as 

* {j) = exp ( 7 IjhEl + 7 ] l^E\ + , y 3 jh 3 E 3 + 7 3 2 jh 3 E 3 + .. ,)ld, 

where all coefficients are zero for j = 0 , and where for j = 1,..., m 

7ij = 7ij-i + a j + Pj 
7i,j = 7i,j-i a 2 — ft 2 

71, j = 7i ,j-4 + “j + Pj 

72, j = 72 j-i + \ a jPj( a j + Pi) + fl,j-l( a j ~ Pj ) - \7i,j-i ( a j + Pi)- 

Proof Similar to Lemma 5.5, the result follows using the BCH formula. □ 

Theorem 5.8. The composition method (5.17) is of order p if 

7i,rn = 1 , 7£,m = 0 for k = 2 ,..., p and all L (5.21) 

The coefficients 7 ^ m are those defined in Lemma 5.7. □ 

It is interesting to see how these order conditions are related to those obtained 
with the use of trees. The conditions m = 1 and y 2 m = 7 i m = 0 are identical 
to the first three order conditions of Example 3.15. The remaining condition for 
order 3, 7 I m = 0, reads 

rri m k— 1 m k —1 

52 ot k p k (a k +/j*;)+y ^( a k - pi ) 52^ ai + p *) _ 52( ak+ P k ) 52^°% ~ Pi) 

k= 1 /c=l 2=1 /c=l i= 1 

m k f m k f 

= 52(°% - fib 52 + &) _ 52( ak +( a i ~ p ^) = °- 

k =1 2=1 fc=l 2—1 
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This condition is just the difference of the order conditions for the trees © o ® and 
® o ©, whose sum is zero by the Switching Lemma 3.8. Therefore the condition 
72 m = 0 is equivalent to (though more complicated than) the fourth condition of 
Example 3.15. 

Symmetric Composition of Symmetric Methods. Consider now a composition 


#/i = #y m fc ° • • • ° ^72 h ° #71 h ° #y 2 fc ° • • • ° (5.22) 


where <P} h is a symmetric method that can be written as 

$ h = exp®i + h 3 S 3 + h 5 S 5 + .. .)ld 

with 5i the Lie derivative operator corresponding to y = f(y). For the Strang 
splitting ° ° ^h /2 suc ^ an ex P ans i° n follows from the symmetric 

BCH formula (4.14), and for general symmetric one-step methods from Sect. IX.2. 
The derivation of the order conditions is similar to the above with defined by 

•f'W = 0 yih , ^ O o <P 7ih , 

so that becomes (5.22). 

Lemma 5.9. The method Ucan be formally written as 

^ j) = exp {a\ tj hE\ + + a+ <4 tj h 6 E% + . . .)ld, 

where E\ = Sk, E\ = [Si [Si, S 3 ]], and where cr\ x cr^i = an d 


= a 






■H 


- a ^W+6 


l (yU a L- 1) 2 - - t 


2 3 
3° 1,3 


-1 + 


.)■ 


Proof. The result is a consequence of the symmetric BCH formula (4.14) with 
IjhSi + y 3 h 3 Ss + ... and a\^_ 1 hE\ + cr^-^hE 3 + ... in the roles of \ A and 
tB , respectively. □ 


Theorem 5.10. The composition method (5.22) is of order p if 

a} m = l, erf m = 0 for odd k = 3,... and all i. (5.23) 

The coefficients erf m are those defined in Lemma 5.9. □ 

Symmetric composition methods up to order 10 will be constructed and dis¬ 
cussed in Sect. V.3. 
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III.6 Exercises 

1. Find all trees of orders 5 and 6. 

2. (A. Cayley 1857). Denote the number of trees of order q by a q . Prove that 

ai + a 2 x + a 3 x 2 + a A x 3 + ... = (1 - x)~ ai (1 - x 2 )~ a2 (1 - x 3 )“ a3 • ... . 


q 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

CLq 

1 

1 

2 

4 

9 

20 

48 

115 

286 

719 


3. Independency of the elementary differentials: show that for every r G T there 
is a system (1.1) such that the first component of F(r) (0) equals 1, and the first 
component of F(u)(0) is zero for all trees u^t. 

Hint. Consider a monotonic labelling of r, and define y[ as the product over all 
yj , where j runs through all labels of vertices that lie directly above the vertex 
“i”. For the first labelling of the tree of Exercise 4 this would be y\ = y^yz, 
V2 = 1,2/3 = 2/4, and y A = 1. 

4. Prove that the coefficient ot(r) of Defin¬ 
ition 1.2 is equal to the number of possi¬ 
ble monotonic labellings of the vertices 
of r, starting with the label 1 for the 
root. For example, the tree [[•], •] has 
three different monotonic labellings. 

In addition, deduce, from (1.22), the recursion formula 



a(r) 



a(ri) •... • a(r m ) 


1 

M lW • • •’ 


( 6 . 1 ) 


where the integers pi, /i 2 , • • • count equal trees among n,..., r m and 


( |r| — 1 y (|r|-l)! 

.,\r m \J |n|! •• |r m |! 


denotes the multinomial coefficient. 

Remark. In the theoretical physics literature, the coefficients a(r) are written 
CM(t) and called “Connes-Moscovici weights”. 

5. If we denote by N(r) the number of elements in OST{r ), then show that 


N( •) = 2, N([r u ..., r m ]) = 1 + N(n) • ... • N(r m ). 


Use this result to compute the number of subtrees of the Christmas tree decorat¬ 
ing formula (1.34). Answer: 6865. 

6. Prove that the elementary differentials for partitioned problems are indepen¬ 
dent. For a given tree (r G TP , find a problem (2.1) such that a certain compo¬ 
nent of F(r)(p, q ) vanishes for all u G TP except for r. 

Hint. Consider the construction of Exercise 3, and define the partitioning of y 
into (p, q) according to the colours of the vertices. 
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7. The number of order conditions for partitioned Runge-Kutta methods (II.2.2) 
is 2 a r for order r, where a r is given by (see Hairer, Nprsett & Wanner (1993), 
page 311) 


r 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

a r 

1 

2 

7 

26 

107 

458 

2058 

9498 

44987 

216598 


Find a formula similar to that of Exercise 2. 

8 . For the special second order differential equation y = g(y ), and for a Nystrom 
method 


s 

h = 9 (yo + Cihy 0 + h 2 ^2 > 


3 = 1 


y i 


= 2/0 + AM yi = yo + h^2 Mi 


( 6 . 2 ) 


i=l 


i=l 


consider the simplifying assumption 


CN{r)) : 

3 = 1 

s 

DN( C) : 

i=1 


djjC k - 2 = 




yk(k — 1 ) k — 


i + l )> 


k = 2,..„C 


Prove that if the quadrature formula (i^, q) is of order p, if fa = 6^(1 — q) 
for all i, and if the simplifying assumptions CN(rj ), DN(() are satisfied with 
2^ + 2 > p and £ + rj > p, then the Nystrom method has order p. 

9. Nystrom methods of maximal order 2s. Prove that there exists a one-parameter 
family of s-stage Nystrom methods (6.2) for y = g(y), which have order 2s. 
Hint. Consider the Gaussian quadrature formula and define the coefficients 
by CN(s) and by 


y2 b i c i 

i=1 


= bn 


,k(k- 1 ) 


T + l) 


for k = 2 ,..., s. 

10. Prove that the coefficient C 4 in the series (4.11) of the Baker-Campbell- 
Hausdorff formula is given by C 4 = [A, [. B , [£?, A]]]/24. 

11. Prove that the series (4.11) converges for \t\ < In2/(||A|| + \\B\\). 

12. By Theorem 5.10 four order conditions have to be satisfied such that the sym¬ 
metric composition method (5.22) is of order 6 . Prove that these conditions are 
equivalent to the four conditions of Example V.3.15. (Care has to be taken due 
to the different meaning of the 7 ^.) 




Chapter IV. 

Conservation of First Integrals and Methods 
on Manifolds 


This chapter deals with the conservation of invariants (first integrals) by numerical 
methods, and with numerical methods for differential equations on manifolds. Our 
investigation will follow two directions. We first investigate which of the methods 
introduced in Chap. II conserve invariants automatically. We shall see that most of 
them conserve linear invariants, a few of them quadratic invariants, and none of 
them conserves cubic or general nonlinear invariants. We then construct new classes 
of methods, which are adapted to known invariants and which force the numerical 
solution to satisfy them. In particular, we study projection methods and methods 
based on local coordinates of the manifold defined by the invariants. We discuss 
in some detail the case where the manifold is a Lie group. Finally, we consider 
differential equations on manifolds with orthogonality constraints, which often arise 
in numerical linear algebra. 


IV. 1 Examples of First Integrals 

Je nomme integrate une equation u — Const, telle que sa differentielle 
du — 0 soit verifiee identiquement par le systeme des equations differen- 
tielles proposees ... (C.G.J. Jacobi 1840, p. 350) 

We consider differential equations 

y = f{y), (i-i) 

where y is a vector or possibly a matrix. 

Definition 1.1. A non-constant function I(y ) is called a first integral of (1.1) if 

I'(y)f(y) = o for all y. (1.2) 

This implies that every solution y(t) of (1.1) satisfies I{y(t)) = Kyo) = Const. 
Synonymously with “first integral”, the terms invariant or conserved quantity or 
constant of motion are also used. 

In Chap. I we have seen many examples of differential equations with invariants. 
For example, the Lotka-Volterra problem (1.1.1) has I(u,v ) = In u — u + 2 In v — v 
as first integral. The pendulum equation (1.1.13) has H(p,q ) = p 2 /2 — cos q , and the 
Kepler problem (1.2.2) has two first integrals, namely H and L of (1.2.3) and (1.2.4). 
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Example 1.2 (Conservation of the Total Energy). Hamiltonian systems are of the 
form 

P=~Hq(p,q), q = H p (p,q), 

where H q = V,,H = ( dH/dq) T and H p = V p H = ( dH/dp) T are the column 
vectors of partial derivatives. The Hamiltonian function H(p, q ) is a first integral. 
This follows at once from H'(p , q) = (dH/dp, dH/dq ) and 

dE_ dH/dH\T _ 

dp V dq ) dq V dp ) 

Example 1.3 (Conservation of the Total Linear and Angular Momentum of 
N-Body Systems). We consider a system of N particles interacting pairwise with 
potential forces which depend on the distances of the particles. This is formulated 
as a Hamiltonian system with total energy (1.4.1), viz., 

N N i —1 

h (p, q ) « \ ~ Pi Pi + YiY Vi i (ll® “ q j ll) • 

i —1 1 i =2 j =1 

Here qi,pi G M 3 represent the position and momentum of the ith particle of mass 
mi, and Vij{r) (i > j) is the interaction potential between the ith and jth particle. 
The equations of motion read 

1 N 

ii = — Pi, Pi = Y V H (® “ 9i) 

m i i=l 

where, for i > j, we have Vij = Vji = —VP ( ^ij)/^ij with = ||^ — ||, and z/^ is 
arbitrary, say va = 0. The conservation of the total linear momentum P = Yl^=i Pi 
and the angular momentum L = Y^=i Qi x Pi is a consequence of the symmetry 
relation z = vjp 

i N N N 

j t J2pi = YY ^ (® -«»•)= 0 

*=1 i=l 1=1 

^ AT TV 1 AT AT 

j t Y ® x ^ = X!— x Pi+Yi Y qi x v a (® - %) = 0 • 

1 i=l i=l ' 1 i=l j=l 

Example 1.4 (Conservation of Mass in Chemical Reactions). Suppose that three 
substances A, B,C undergo a chemical reaction such as 1 

A B (slow) 

B + B 310 > C + B (very fast) 

B + C A + C (fast). 


1 This Robertson problem is very popular in testing codes for stiff differential equations. 
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We denote the masses (or concentrations) of the substances A, B , C by yi, y2, 2/3, 
respectively. By the mass action law this leads to the equations 

A: yi = -0.04 yi +10 4 2/22/3 

B: y 2 = 0.042/1 - 10 4 2/22/3 — 3 - 10 7 

C: 2/3= 3 • 10 7 2/| 

We see that y\ + 2/2 + 2/3 = 0, hence the total mass /(y) = 7/1 + 2/2 + 2/3 is an 
invariant of the system. 

As was noted by Shampine (1986), such linear invariants are generally con¬ 
served by numerical integrators. 

Theorem 1.5 (Conservation of Linear Invariants). All explicit and implicit 
Runge-Kutta methods conserve linear invariants. Partitioned Runge-Kutta meth¬ 
ods (II. 2.2) conserve linear invariants ifbi = bifor all i, or if the invariant depends 
only on p or only on q. 

Proof. Let I(y) = d T y with a constant vector d , so that d T f(y) = 0 for all y. 
In the case of Runge-Kutta methods we thus have d T ki = 0, and consequently 
d T yi = d T yo + hd T (Y^i= 1 bikf) = d T yo. The statement for partitioned methods is 
proved similarly. □ 

Next we consider differential equations of the form 

Y = A(Y)Y (1.3) 

where Y can be a vector or a matrix (not necessarily a square matrix). We then have 
the following result. 

Theorem 1.6. If A(Y) is skew-symmetric for all Y (i.e., A T = —A), then the 
quadratic function I(Y) = Y T Y is an invariant. In particular, if the initial value Yq 
consists of orthonormal columns (i.e., Yq'Yq = I), then the columns of the solution 
Y (t) of (1.3) remain orthonormal for all t. 

Proof. The derivative of I(Y) is I'(Y)H = Y T H + H T Y. Thus, we have 
I r (Y)f(Y) = I'(Y)(A(Y)Y) = Y T A(Y)Y + Y T A(Y) T Y for all Y which van¬ 
ishes, because A(Y) is skew-symmetric. This proves the statement. □ 


Example 1.7 (Rigid Body). The motion of a free rigid body, whose centre of mass 
is at the origin, is described by the Euler equations 

Vi = ai2/22/3, at = (h ~ h)/{hh) 

2/2 = <2.22/32/1-. a2 = {hi — )/ (hh ) (1-4) 

a 3 = {h - h)/{hh) 


2/3 = <232/12/2, 




100 IV. Conservation of First Integrals and Methods on Manifolds 


where the vector y = ( 2 / 1 ,t/ 2 , 2 / 3 ) T represents the angular momentum in the 
body frame, and / 1 , h are the principal moments of inertia (Euler (1758b); see 
Sect. VII.5 for a detailed description. This problem can be written as 


m\ I 

( 0 

ys/h 

1 

to 

( yi 

m = 

-ys/h 

0 

vi/h 

y2 

mj 

\ yi/h 

-yi/h 

0 ) 

\ys 


(1.5) 


which is of the form (1.3) with a skew-symmetric matrix A(Y). By Theorem 1.6, 
y\ + y\ + y\ I s an invariant. A second quadratic invariant is 

h ( vm) = \(A + f + f), 

which represents the kinetic energy. 

Inspired by the cover page of Marsden & Ratiu (1999), we present in Fig. 1.1 
the sphere with some of the solutions of (1.4) corresponding to I\ = 2, J 2 = 1 
and I 3 = 2/3. They lie on the intersection of the sphere with the ellipsoid given 
by H(yi, 7 / 2 , 2 / 3 ) = Const. In the left picture we have included the numerical so¬ 
lution (30 steps) obtained by the implicit midpoint rule with step size h m 0.3 and 
initial value yo = (cos(1.1), 0, sin(l.l)) T . It stays exactly on a solution curve. This 
follows from the fact that the implicit midpoint rule preserves quadratic invariants 
exactly (Sect. IV.2). 

For the explicit Euler method (right picture of Fig. 1.1, 320 steps with h = 
0.05 and the same initial value) we see that the numerical solution shows a wrong 
qualitative behaviour (it should lie on a closed curve). The numerical solution even 
drifts away from the sphere. 




Fig. 1.1. Solutions of the Euler equations (1.4) for the rigid body 
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IV.2 Quadratic Invariants 

Quadratic invariants appear often in applications. Examples are the conservation 
law of angular momentum in 7V-body systems (Example 1.3), the two invariants of 
the rigid body motion (Example 1.7), and the invariant Y T Y of Theorem 1.6. We 
therefore consider differential equations (1.1) and quadratic functions 

Q(y) = y T Cy, (2.1) 

where C is a symmetric square matrix. It is an invariant of (1.1) if y T Cf(y) = 0 
for all y. 


IV.2.1 Runge-Kutta Methods 

We shall give a complete characterization of Runge-Kutta methods which automati¬ 
cally conserve all quadratic invariants. We first of all consider the Gauss collocation 
methods. 

Theorem 2.1. The Gauss methods of Sect. II. 1.3 (collocation based on the shifted 
Legendre polynomials) conserve quadratic invariants. 

Proof. Let u(t) be the collocation polynomial of the Gauss methods (Defini¬ 
tion II.1.3). Since ^ Q(u(t )) = 2 u(t) T Cii(t), it follows from u(to) = yo and 
u(to + ft) = 2/1 that 

rto~\-h 

yf c Vi - Vo c Vo = 2 / u(t) T Cu(t ) dt. (2.2) 

Jto 

The integrand u(t) T Cii(t) is a polynomial of degree 2 s — 1, which is integrated 
without error by the 8-stage Gaussian quadrature formula. It therefore follows from 
the collocation condition 

u(to + Cih) T Cu(to + ah) = u(to + Cih) T Cf(u(to + ah)) = 0 

that the integral in (2.2) vanishes. □ 

Since the implicit midpoint rule is the special case 8 = 1 of the Gauss methods, 
the preceding theorem explains its good behaviour for the rigid body simulation in 
Fig 1.1. 

Theorem 2.2 (Cooper 1987). If the coefficients of a Runge-Kutta method satisfy 
bidij + bjOji = bffij for all i,j = 1,..., 8, (2.3) 

then it conserves quadratic invariants. 2 

2 For irreducible methods, the conditions of Theorem 2.2 and Theorem 2.4 are also neces¬ 
sary for the conservation of all quadratic invariants. This follows from the discussion in 
Sect. VI.7.3. 



102 IV. Conservation of First Integrals and Methods on Manifolds 


Proof. The proof is the same as that for B-stability, given independently by Burrage 
& Butcher and Crouzeix in 1979 (see Hairer & Wanner (1996), Sect. IV.12). 

The relation y± = yo + h X^=i hki of Definition II. 1.1 yields 

s s s 

yTCy 1 =yoCy 0 + h'^2b i kJCyo + h'^2b j yQCk j +h 2 bibjkJCkj. (2.4) 

i=l j=1 i,j=1 

We then write k{ — f(Yi) with Yi = yo + i Uijkj- The main idea is to 

compute yo from this relation and to insert it into the central expressions of (2.4). 
This yields (using the symmetry of C) 

s s 

Ih Cy l = VoCy 0 + 2 h^bi Y?Cf(Yi) + h 2 ^ (hbj - b t a- bjCLji) kfCkj. 

i =1 i,j = l 

The condition (2.3) together with the assumption y T Cf(y ) = 0, which states that 
y T Cy is an invariant of (1.1), imply yjCy x = Cy 0 . □ 

The criterion (2.3) is very restrictive. One finds that among all collocation and 
discontinuous collocation methods (Definition II. 1.7) only the Gauss methods sat¬ 
isfy this criterion (Exercise 6). On the other hand, it is possible to construct other 
high-order Runge-Kutta methods satisfying (2.3). The key for such a construction is 
the VF-transformation (see Hairer & Wanner (1996), Sect. IV.5), which is exploited 
in the articles of Sun (1993a) and Hairer & Leone (2000). 


IV.2.2 Partitioned Runge-Kutta Methods 

We next consider partitioned Runge-Kutta methods for systems y = f(y,z), 
z = g(y,z). Usually such methods cannot conserve general quadratic invariants 
(Exercise 4). We therefore concentrate on quadratic invariants of the form 

Q{y , ~) = y T Dz, (2.5) 


where D is a matrix of the appropriate dimensions. Observe that the angular mo¬ 
mentum of N --body systems (Example 1.3) is of this form. 


Theorem 2.3 (Sun 1993b). The Lobatto IIIA - IIIB pair conserves all quadratic 
invariants of the form (2.5). In particular, this is true for the Stormer-Verlet scheme 
(see Sect. II.2.2). 


Proof. Let u(t) and v(t) be the (discontinuous) collocation polynomials of the Lo¬ 
batto IIIA and Lobatto IIIB methods, respectively (see Sect. II.2.2). In analogy to 
the proof of Theorem 2.1 we have 


Q(u(t 0 + h),v(t 0 + h)) - Q(u(t 0 ),v(t 0 )) 


rto~\~h 


/t 0 


(p(u(t),v(t)) + Q(u(t),v(t))^j dt. 


( 2 . 6 ) 
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Since u(t) is of degree 5 and v(t) of degree 5 — 2, the integrand of (2.6) is a poly¬ 
nomial of degree 25 — 3. Hence, an application of the Lobatto quadrature yields the 
exact result. Using the fact that Q(y, z) is an invariant of the differential equation, 
i.e., Q(f(y , z),z) + Q(y, g(y , z)) =0, we thus obtain for the integral in (2.6) 

hbi Q(u(to),S(t 0 )) + hb s Q(u(t 0 + h),5(t 0 + ft)), 

where S(t) = v(t) — g(u(t),v(t)') denotes the defect. It now follows from u(to) = 
2/o> u(to + ft) = 2/1 (definition of Lobatto IIIA) and from v(to) = zo — hbiS(to), 
v(t o + ft) = zi + hb s 5(to + ft) (definition of Lobatto IIIB) that Q ( 2 / 1 , 21 ) — 
Q(2/o, 20 ) = 0, which proves the theorem. □ 

Exchanging the role of the IIIA and IIIB methods also leads to an integrator 
that preserves quadratic invariants of the form (2.5). The following characterization 
extends Theorem 2.2 to partitioned Runge-Kutta methods. 

Theorem 2.4. If the coefficients of a partitioned Runge-Kutta method (II. 2.2) sat¬ 
isfy 

bfcLij + bjdji = bibj for i, j = 1,..., 5 , (2.7) 

bi = bi for i = 1 ,... ,5, (2.8) 

then it conserves quadratic invariants of the form (2.5). 

If the partitioned differential equation is of the special form y = f(z), z = g(y), 
then condition (2.7) alone implies that invariants of the form (2.5) are conserved. 

Proof. The proof is nearly identical to that of Theorem 2.2. Instead of (2.4) we get 

s s s 

1 ii Dz x = Pq Dz 0 + ft ^2 bi kf Dz q + h^2bj y q D£j + ft 2 ^ bibj kjD£j. 

i=l j=1 i,j =1 

Denoting by (Yi, Zf) the arguments of ki = f(Y i: Zi) and ^ = g(Y i: Zi), the same 
trick as in the proof of Theorem 2.2 gives 

s s 

y\D Zl = DqDz 0 + hy2bif(Y i ,Z i ) T DZ i + hy2b j YyDg(Y j ,Z j ) 

i=l j=1 

S 

+ ft 2 ( bibj — bidij — bjdji) kf D£j. (2.9) 

ftj=i 

Since (2.5) is an invariant, we have f(y, z) T Dz + y T Dg(y, z) = 0 for all y and 2 . 
Consequently, the two conditions (2.7) and (2.8) imply y'[Dz 1 = yfiDz 0 . 

For the special case where / depends only on z and g only on y , the assumption 
f(z) T Dz + y T Dg(y) = 0 (for all y,z) implies that f(z) T Dz = — y T Dg(y ) = 
Const. Therefore, condition (2.8) is no longer necessary for the proof of the state¬ 
ment. □ 
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IV.2.3 Nystrom Methods 

An important class of partitioned differential equations is y = z, z = g{y) or, 
equivalently, 

y = 9(y)- (2.10) 

Many examples of Chap. I are of this form, in particular the 7V-body problem of 
Example 1.3 for which the angular momentum is a quadratic first integral. Nystrom 
methods (Definition II.2.3), 


s 

h = g(yo +Cihy 0 + h 2 '^2a ij £ j ^j, 

3= i 

s s 

Vi = Vo + hyo + h2 ^2V 1 = yo + h ^2 bi( ’ i ' 

i= 1 i= 1 


( 2 . 11 ) 


are adapted to the numerical solution of (2.10) and it is interesting to investigate 
which methods within this class can conserve quadratic invariants. 


Theorem 2.5. If the coefficients of the Nystrom method (2.11) satisfy 


Pi = bi( 1 -Ci) for i = l,... y s, 
bi(Pj-aij) = bj(Pi — aji) for i, j = 1 ,..., s, 


( 2 . 12 ) 


then it conserves all quadratic invariants of the form y T D y. 

Proof The quadratic form Q(y,y) = y T Dy is a first integral of (2.10) if and only 
if 

y T D y + y T D g(y) = 0 for all y,y e M n . (2.13) 

This implies that D is skew-symmetric and that y T D g(y) = 0. 

In the same way as for the proofs of Theorems 2.2 and 2.4 we now com¬ 
pute y^Dyi using the formulas of (2.11) and we substitute yo by Yi — Cihyo — 
h 2 where Yi denotes the argument of g in (2.11). This yields 


yjDyi = yoDyo + hyoDyo + hy^biY^Dti 

i=i 

s s 

+ h 2 y2Pitf D yO + h 2 'y2 b i( 1 - C i)y'o Di i 

i=1 i—1 

s 

+ h 3 J2 biiPj - CHj) ejD£i. 

i,j=1 

Using the skew-symmetry of D and Y?D ^ = Y?D g{Y%) = 0, condition (2.12) 
implies the conservation property yf D yi = y^ D y 0 . □ 
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Remark 2.6 (Composition Methods). If a method $h conserves quadratic invari¬ 
ants (e.g., the mid-point rule by Theorem 2.1 or the Stormer-Verlet scheme by Theo¬ 
rem 2.3 or a Nystrom method of Theorem 2.5), then so does the composition method 

Vh=$ lsh o... o$ llh . (2.14) 

This obvious property is one of the most important motivations for considering com¬ 
position methods. 


IV.3 Polynomial Invariants 

We consider two classes of problems with polynomial invariants for degree higher 
than two. First, we treat linear problems for which the determinant of the resolvent is 
an invariant, and we show that (partitioned) Runge-Kutta methods cannot conserve 
them automatically. Second, we study isospectral flows. 

IV.3.1 The Determinant as a First Integral 

We consider quasi-linear problems 

Y = A(Y)Y, Y{ 0 ) = Y 0 (3.1) 

where Y and A(Y) are n x n matrices. In the following we denote the trace of a 
matrix A = (a,ij )? j=1 by traced = Yn=i «»• 

Lemma 3.1. If trace A(Y) = 0 for all Y, then g(Y ) := det Y is an invariant of 
the matrix differential equation (3.1). 

Proof. It follows from 

det(y + eAY^j = det (/ + eA) det Y = (l + 6 trace A + 0(e 2 )) det Y 

that g' ( Y)(AY ) = trace A • det Y (this is the Abel-Liouville-Jacobi-Ostrogradskii 
identity). Hence, the determinant g(Y) = det Y is an invariant of the differential 
equation (3.1) if trace A(Y) = 0 for all Y. □ 

Since detF represents the volume of the parallelepiped generated by the 
columns of the matrix Y, the conservation of the invariant g(Y) = det Y is related 
to volume preservation. This topic will be further discussed in Sect. VI.9. Here, we 
consider det Y as a polynomial invariant of degree n, and we investigate whether 
Runge-Kutta methods can automatically conserve this invariant forn > 3. The key 
lemma for this study is the following. 
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Lemma 3.2 (Feng Kang & Shang Zai-jiu 1995). Let R(z ) be a differentiable 
function defined in a neighbourhood of z = 0, and assume that R(0) = 1 and 
R'( 0) = 1. Then, we have for n > 3 

det R(A) = 1 for all n x n matrices A satisfying trace A = 0, (3.2) 

if and only if R(z) = exp(z). 

Proof The “if” part follows from Lemma 3.1, because for constant A the solution 
of Y = AY, Y(0) = I is given by Y ( t ) = exp (At). 

For the proof of the “only if” part, we consider diagonal matrices of the form 
A = diag(//, v, — (/x + v), 0,..., 0), which have trace A = 0, and for which 

R(A) = diag(.R(/x), R(v), + v)), R{ 0),..., i?(0)). 

The assumptions R( 0) = 1 and (3.2) imply 

R(fjL)R(v)R(-(iJL + v)) = 1 (3.3) 

for all /i , v close to 0. Putting v = 0, this relation yields R(p)R(—p J ) = 1 for all /i, 
and therefore (3.3) can be written as 

R(/i)R(v) = R(/jL + v) for all /i, z/ close to 0. (3.4) 


This functional equation can only be satisfied by the exponential function. This is 
seen as follows: from (3.4) we have 


R(/i + s) — R(ff 

e 


R(t) 


R{e) - R{ 0) 

5 


Taking the limit 6 —> 0 we obtain R'ff) = R(ff), because R'( 0) = 1. This implies 
R{ff) = exp(//). □ 


Theorem 3.3. For n > 3, no Runge-Kutta method can conserve all polynomial 
invariants of degree n. 

Proof It is sufficient to consider linear problems Y = AY with constant matrix A 
satisfying traced = 0, so that g(Y) = detF is a polynomial invariant of degree 
n. Applying a Runge-Kutta method to such a differential equation yields Y\ — 
R(hA)Yo, where 

R(z) = l + zb T (I-zA)~ 1 l 

(b T = (bi,..., b s ), 1 = (1,..., 1 ) T and A = (a,ij) is the matrix of Runge- 
Kutta coefficients) is the so-called stability function. It is seen to be rational. 
By Lemma 3.2 it is therefore not possible that det R(hA) = 1 for all A with 
trace A =0. □ 


This negative result motivates the search for new methods which can conserve 
polynomial invariants (see Sects. IV.4, IV.8 and VI.9). We consider here another 
interesting class of problems with polynomial invariants of degree higher than two. 
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IV.3.2 Isospectral Flows 

Such flows are created by a matrix differential equation 

L=[B(L),L], L(0) = L 0 (3.5) 

where Lq is a given symmetric matrix, B(L) is skew-symmetric for all L , and 
{B,L\ = BL — LB is the commutator of B and L. Many interesting problems can 
be written in this form. We just mention the Toda system, the continuous realization 
of QR-type algorithms, projected gradient flows, and inverse eigenvalue problems 
(see Chu (1992) and Calvo, Iserles & Zanna (1997) for long lists of references). 

Lemma 3.4 (Lax 1968, Flaschka 1974). Let Lq be symmetric and assume that 
B(L) is skew-symmetric for all L. Then, the solution L(t ) of (3.5) is a symmetric 
matrix, and its eigenvalues are independent oft. 

Proof. The symmetry of L(t) follows from the fact that the commutator of a skew- 
symmetric with a symmetric matrix gives a symmetric matrix. 

To prove the isospectrality of the flow, we define U ( t ) by 

U = B(L(t))U, 17(0) =/. (3.6) 

Then, we have ( d/dt)(U~ 1 LU ) = £/ -1 (L — BL + LB)U = 0, and hence 
U(t)~ 1 L(t)U(t) = I/ 0 for all t, so that L(t) = f/(t)I/ 0 (7(t) _1 is the solution 
of (3.5). This proves the result. □ 

Note that, since B(L) is skew-symmetric, the matrix U ( t ) of (3.6) is orthogonal 
by Theorem 1.6. 

Lemma 3.4 shows that the characteristic polynomial det(L — XI) = Lo a ^ 
and hence the coefficients ai also are independent of t. These coefficients are all 
polynomial invariants (e.g., ao = detL, a n _i = itraceL). Because of Theo¬ 
rem 3.3 there is no hope that Runge-Kutta methods applied to (3.5) can conserve 
these invariants automatically for n > 3. 

Isospectral Methods. The proof of Lemma 3.4, however, suggests an interesting 
approach for the numerical solution of (3.5). For n = 0,1,... we solve numerically 

U = B(UL n U T )U , U (0) = I (3.7) 

and we put T n +i = UL n U T , where U is the numerical approximation U « U(h) 
after one step (cf. Calvo, Iserles & Zanna 1999). If B(L) is skew-symmetric for all 
matrices L, then U T U is a quadratic invariant of (3.7) and the methods of Sect. IV.2 
will produce an orthogonal U. Consequently, I/ n +i and L n have exactly the same 
eigenvalues, and they remain symmetric. 

Diele, Lopez & Politi (1998) suggest the use of the Cayley transform U = 
(I — F) _1 (I + T), which transforms (3.7) into 

Y = 1 -(I-Y)B(UL n U T )(I + Y), Y(0) = 0, 
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and the orthogonality of U into the skew-symmetry of Y (see Lemma 8.8 below). 
Since all (also explicit) Runge-Kutta methods preserve the skew-symmetry of Y, 
which is a linear invariant, this yields an approach to explicit isospectral methods. 

Connection with the QR Algorithm. In a diversion from the main theme of this 
section, we now show the relationship of the flow of (3.5) with the QR algorithm for 
the symmetric eigenvalue problem. Starting from a real symmetric matrix Ao, the 
basic QR algorithm (without shifts) computes a sequence of orthogonally similar 
matrices Ai, A 2 , A 3 ,... , expected to converge towards a diagonal matrix carrying 
the eigenvalues of A 0 . Iteratively for k = 0, 1 , 2 , ..., one computes the QR decom¬ 
position of Ak'. 

Ak = QkRk 

with Qk orthogonal, Rk upper triangular (the decomposition becomes unique if the 
diagonal elements of Rk are taken positive). Then, Ak+i is obtained by reversing 
the order of multiplication: 

Ak+i = RkQk • 

It is an easy exercise to show that Q(k) = Q 0 Q 1 • • • Qk- 1 is the matrix in the 
orthogonal similarity transformation between Ao and Ak: 

A k = Q(k) T A 0 Q(k) (3.8) 

and the same matrix Q(k) is the orthogonal factor in the QR decomposition of A k : 

A k 0 = Q(k)R(k). (3.9) 

Consider now, for an arbitrary real function / defined on the eigenvalues of a real 
symmetric matrix L 0 , the QR decomposition 

exp(tf(L 0 )) = Q(t)R(t ) (3.10) 

and define 

L(t):=Q(tfL 0 Q(t). (3.11) 

The relations (3.8) and (3.9) then show that for integer times t = k, the matrix 
exp(/(L(fc))) = Q(k) T exp(/(L 0 ))Q(fc) coincides with the kth matrix in the QR 
algorithm starting from A) = exp(/(L 0 )): 

exp(/(L(fc))) = A k . (3.12) 

Now, how is all this related to the system (3.5)7 Differentiating (3.11) as in the 
proof of Lemma 3.4 shows that L(t) solves a differential equation of the form L = 
[. B , L\ with the skew-symmetric matrix B = —Q T Q. At first sight, however, B is a 
function of £, not of L. On the other hand, differentiation of (3.10) yields (omitting 
the argument t where it is clear from the context) 

f(L 0 )QR = f(L 0 ) exp(i/(L 0 )) = exp (tf(L 0 ))f(L 0 ) = QR + QR , 
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and since f(L) = Q T f(Lo)Q by (3.11), this becomes 

f(L) = Q t Q + RR - 1 . 


Here the left-hand side is a symmetric matrix, and the right-hand side is the sum of a 
skew-symmetric and an upper triangular matrix. It follows that the skew-symmetric 
matrix B = —Q T Q is given by 

B(L) = f(L) + - /(!<)+, (3.13) 

where /(L) + denotes the part of f(L) above the diagonal. Hence, L(t) is the solu¬ 
tion of an autonomous system (3.5) with a skew-symmetric B(L). 

For f{x) = x and assuming Lo symmetric and tridiagonal, the flow of (3.5) with 
(3.13) is known as the Toda flow. The QR iterates A 0 = exp(L 0 )> Ai, A 2r ... of the 
exponential of Lq are seen to be equal to the exponentials of the solution L(t) of 
the Toda equations at integer times: Ak = exp(L(fc)), a discovery of Symes (1982). 
An interesting connection of the Toda equations with a mechanical system will be 
discussed in Sect. X.1.5. 

For f{x) = log x , the above arguments show that the QR iteration itself, starting 
from a positive definite symmetric tridiagonal matrix, is the evaluation Ak = L(k) 
at integer times of a solution L(t) of the differential equation (3.5) with B given 
by (3.13). This relationship was explored in a series of papers by Deift, Li, Nanda 
&Tomei (1983, 1989, 1993). 

Notwithstanding the mathematical beauty of this relationship, it must be re¬ 
marked that the practical QR algorithm (with shifts and deflation) follows a different 
path. 


IV.4 Projection Methods 

Und bist du nicht willig, so brauch ich Gewalt. 

(J.W. Goethe, Der Erlkonig) 

Suppose we have an (n — m )-dimensional submanifold of M n , 

M = {y ; g{y) = 0} (4.1) 

(g : W 1 —► M m ), and a differential equation y = f(y) with the property that 

yo £ M implies y(t) G M for all t. (4.2) 

We want to emphasize that this assumption is weaker than the requirement that 
all components gfly) of g(y) are invariants in the sense of Definition 1.1. In fact, 
assumption (4.2) is equivalent to g'(y)f(y ) = 0 for y G M, whereas Definition 1.1 
requires g'(y)f(y) = 0 for all y G M n . In the situation of (4.2) we call g(y) a weak 
invariant , and we say that y = f(y) is a differential equation on the manifold M. 
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Fig. 4.1. The implicit midpoint rule applied to the differential equation (4.3). The picture 
shows the numerical values for q\ + obtained with step size h — 0.1 (thick line) and 
h — 0.05 (thin line) 


Example 4.1. Consider the pendulum equation written in Cartesian coordinates: 

qi=Pi, P! = -qiX, 

<72 = P2, P2 = -1 - <72A, 

where A = (pi + [>\ — < 72 )/(<7i + < 72 )■ One can check by differentiation that qip\ + 
Q 2 P 2 (orthogonality of the position and velocity vectors) is an invariant in the sense 
of Definition 1.1. However, qf (length of the pendulum) is only a weak invariant. 
The experiment of Fig. 4.1 shows that even methods which conserve quadratic first 
integrals (cf. Sect. IV.2) do not conserve the quadratic weak invariant qf + q%. No 
numerical method that is allowed to evaluate the vector field f(y ) outside M can 
be expected to conserve weak invariants exactly. This is one of the motivations for 
considering the methods of this and the subsequent sections. 

A natural approach to the numerical solution of differential equations on mani¬ 
folds is by projection (see e.g., Hairer & Wanner (1996), Sect. VII.2, Eich-Soellner 
& Fiihrer (1998), Sect. 5.3.3). 

Algorithm 4.2 (Standard Projection Method). Assume that y n G M. One step 
Un ► Dn+i A defined as follows (see Fig. 4.2): 

• Compute y n +i = @h(yn)> where <d>h is an arbitrary one-step method applied to 

y = f(y); 

• project the value y n +1 onto the manifold A4 to obtain y n +1 € M. 



Fig. 4.2. Illustration of the standard projection method 


For y n G M the distance of y n + i to the manifold M is of the size of the local 
error, i.e., 0(h p+1 ). Therefore, the projection does not deteriorate the convergence 
order of the method. 
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For the computation of y n + 1 we have to solve the constrained minimization 
problem 

Ibn+i - tfn+i|| ->• min subject to g(y n + 1 ) = 0. (4.4) 

In the case of the Euclidean norm, a standard approach is to introduce Lagrange mul¬ 
tipliers A = (Ai,..., A m ) T , and to consider the Lagrange function £(y n + 1 , A) = 
\\Vn+i ~ Vn+ i|| 2 /2 - g(y n+1 ) T \. The necessary condition dC/dy n + i = 0 then 
leads to the system 

Vn +1 = Vn +1 +g'{y n +l) T ^ 

0 = g(y n + 1 ). 

We have replaced y n + 1 with y n +i in the argument of g'(y) in order to save some 
evaluations of g'(y). Inserting the first relation of (4.5) into the second gives a non¬ 
linear equation for A, which can be efficiently solved by simplified Newton itera¬ 
tions: 

= i^g (pn+i)t/ (y n - |-i) ^ g(^y n -\-\-\-g (p n +i) A^, A^+i = \i~\- AX^. 

Lor the choice Ao = 0 the first increment AXq is of size 0(h p+1 ), so that the conver¬ 
gence is usually extremely fast. Often, one simplified Newton iteration is sufficient. 

Example 4.3. As a first example we consider the 
perturbed Kepler problem (see Exercise 1.12) with 
Hamiltonian function 

H(p,q) = ^(pl+pt)- 
0.005 

~ zTW+W' 

and initial values gi(0) = 1 — e, g 2 (0) = 0, 

Pi(0) = 0,p 2 (0) = y^(l + e)/(l — e) (eccentric¬ 
ity e = 0.6) on the interval 0 < t < 200. The exact 
solution (plotted to the right) is approximately an ellipse that rotates slowly around 
one of its foci. Lor this problem we know two first integrals: the Hamiltonian func¬ 
tion H (p, q) and the angular momentum L(p, q ) = qip 2 — g 2 pi. 

We apply the explicit Euler method and the symplectic Euler method (1.1.9), 
both with constant step size h = 0.03. The result is shown in Lig. 4.3. The nu¬ 
merical solution of the explicit Euler method (without projection) is completely 
wrong. The projection onto the manifold { H(p , q) = H(po, go)} improves the nu¬ 
merical solution, but it still has a wrong qualitative behaviour. Only projection onto 
both invariants, H(p,q) = Const and L(p 1 q) = Const gives the correct behav¬ 
iour. The symplectic Euler method already shows the correct behaviour without 
any projections (see Chap. IX for an explanation). Surprisingly, a projection onto 
H (p, q) = Const destroys this behaviour, the numerical solution approaches the 
centre and the simplified Newton iterations fail to converge beyond t = 25.23. Pro¬ 
jection onto both invariants re-establishes the correct behaviour. 
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explicit Euler, h = 0.03 



symplectic Euler, h = 0.03 



Fig. 4.3. Numerical solutions obtained with and without projections 


explicit Euler, projection onto H 

explicit Euler, projection onto H and L 



—- 

^- -- -— -■ 


Fig. 4.4. Explicit Euler method with projections applied to the outer solar system, step size 
h — 10 (days), interval 0 < t < 200 000 


Example 4.4 (Outer Solar System). Having encountered excellent experience 
with projections onto H and L for the perturbed Kepler problem (Example 4.3), 
let us apply the same idea to a more realistic problem in celestial mechanics. We 
consider the outer solar system as described in Sect. 1.2. The numerical solution 
of the explicit Euler method applied with constant step size h — 10, once with 
projection onto H = Const and once with projection onto H = Const and 
L = Const , is shown in Fig. 4.4 (observe that the conservation of the angular 
momentum L(p, q ) = YliLi Qi x Pi consists of three first integrals). We see a slight 
improvement in the orbits of Jupiter, Saturn and Uranus (compared to the explicit 
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Euler method without projections, see Fig. 1.2.4), but the orbit of Neptune becomes 
even worse. There is no doubt that this problem contains a structure which cannot 
be correctly simulated by methods that only preserve the total energy H and the 
angular momentum L. 

Example 4.5 (Volume Preservation). Consider the matrix differential equation 
Y = A(Y)Y , where trace A(Y) = 0 for all Y. We know from Lemma 3.1 that 
g(Y) = det Y is an invariant which cannot be automatically conserved by Runge- 
Kutta methods. Here, we show how we can enforce this invariant by projection. Let 
Y n+ 1 be the numerical approximation obtained with an arbitrary one-step method. 

We consider the Frobenius norm \\Y\\f = \Vij\ 2 f° r measuring the distance 

to the manifold {Y ; g(Y) = 0}. Using g'(Y)(AY) = traced det Y (see the proof 
of Lemma 3.1) with A chosen such that the product AY contains only one non-zero 
element, the projection step (4.5) is seen to become (Exercise 9) 

^n+l — ^n+1 + ^Y n +i (4.6) 

with the scalar /i = AdetF n+ i. This leads to the scalar nonlinear equation 
det(y n+ i + fiY~^ = det Y n , for which simplified Newton iterations become 

det(y„+i + HiY~Y) (l + (fJ>i+ 1 - MOtace^L+iL+i)- 1 )) = det Y n . 

If the Qi?-decomposition of Y n+ 1 is available from the computation of det Y n+ 1 , 
the value of trace((F r ^ 1 F n+ i) _1 ) can be computed efficiently with 0 (n 3 / 3) flops 
(see e.g., Golub & Van Loan (1989), Sect. 5.3.9). _ 

The above projection is preferable to Y n+ 1 = cY n+ 1 , where c G Mis chosen 
such that detF n+ i = detF n . This latter projection is already ill-conditioned for 
diagonal matrices with entries that differ by several magnitudes. 

As a conclusion to the above numerical experiments we see that a projection 
can give excellent results, but can also destroy the good long-time behaviour of the 
solution if applied inappropriately. If the original method already preserves some 
structure, then projection to a subset of invariants may destroy the good long-time 
behaviour. An important modification for reversible differential equations (symmet¬ 
ric projections) wifi be presented in Sect. V.4.1. 


IV.5 Numerical Methods Based on Local Coordinates 

A second important class of methods for the numerical treatment of differential 
equations on manifolds uses local coordinates. Before explaining the ideas, we find 
it appropriate to discuss in more detail manifolds and differential equations on man¬ 
ifolds. 
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IV.5.1 Manifolds and the Tangent Space 

In Sect. IV.4 we assumed that locally (in a neighbourhood U of a G M n ) a manifold 
is given by constraints, i.e., 

M = {yeU; g(y) = 0 }, (5.1) 

where g : U —> M m is differentiable, g(a) = 0 , and g'(a) has full rank m. 

Here, we use local parameters to characterize a manifold. Let fj : V —> W 1 be 
differentiable (V C M n_m is a neighbourhood of 0), -0(0) = a, and assume that 
?//(()) has full rank n — m. Then, a manifold is locally given by 

M = {y = ; z G V} (5.2) 

provided that C is sufficiently small, so that 0 : V —> 0(y) is bijective with 
continuous inverse. The variables z are called parameters or local coordinates of 
the manifold. 

As an example, consider the unit sphere which, in the form (5.1), is given by the 
function g(y\ , 7/2, 2/3) = y\ + y\ + 2/3 — 1 . There are many possible choices of local 
coordinates. Away from the equator (i.e., 2/3 = 0 ), we can take z = (27 ,zf) T := 
(t/i, 2/2)"^ and 0 (z) = (2:1, ^2, ±a /1 — 2^ — zf) T . Alternatively, we can consider 
spherical coordinates f>(a, / 3 ) = (cos a sin 0 , sin a sin /?, cos 0 ) T away from the 
north and south poles (i.e., 2/1 =2/2 =0,2/3 = ± 1 ). 

The tangent to a curve (or the tangent plane to a surface) is an affine space 
passing through the contact point a G AL It is convenient to place the origin at a, 
so that we obtain a vector space. More precisely, for a manifold M. we define the 
tangent space at a G M. as 

q-, jlj _ ( mm there exists a differentiable path "y : (—e) —> M 77, 1 
a C ^ with 7 (t) G M for all t, 7 ( 0 ) = a, 7 ( 0 ) =v J ’ ’ ' 

Lemma 5.1. If the manifold A4 is given by (5.1), where g : U —> M m is differen¬ 
tiable, g(a) = 0, and g'(a) has full rank m, then we have 

T a M = ker g'(a) = {v G M n | g\a)v = 0}. (5.4) 

If M. is given by (5.2), where if : V W 1 is differentiable, 0(0) = a, and 0'(O) 
has full rank n — m, then we have 

T a M = Im0'(O) = {0'(O )w | w G M n “ m }. (5.5) 

Proof, a) For a path 7 (t) satisfying 7 ( 0 ) = a and g(y(t)) = 0 it follows by differ¬ 
entiation that g'(a) 7 ( 0 ) = 0. Consequently, we have T a fA C ker g'(a). 

Consider now the function F(t, u) = g(a-\-tv-\-g'(a) T u). We have F(0, 0) = 0 
and an invertible dF/du( 0, 0) = g'(a)g'(a ) T , so that by the implicit function the¬ 
orem the relation F(t,u) = 0 can be solved locally for u = u(t). If v G ker g'(a), 
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it follows that 7i(0) = 0, and the path 7 (£) = a + tv + g'(a) T u(t ) satisfies all 
requirements of (5.3), so that also T a M Z> ker g'(a). 

b) Assume M to be given by (5.2). For an arbitrary g : (—£, e) —> M m satisfying 
7(0) = 0, the path 7(t) = lies in M and satisfies 7(0) = 7//(0)77(0). This 

proves Im^'(O) C T a M. 

The assumption on the rank of t//( 0) implies that, after a reordering of the 
components, we have if(z) = (7/7(2:), fi> 2 (z)) T , where 7/7(2:) is a local diffeomor- 
phism (by the inverse function theorem). We show that every smooth path 7 (t) in 
A4 can be written as 7 (t) = (g(t )) with some smooth 77 (f). This then implies 
T a M C Im^'(O). To prove this we split 7 (t) = ( 71 (f), 72 (f)) T according to the 
partitioning of if, and we define g(t) = V 7 ] -1 (71 (t )). Since for 7 (f) G Ad the second 
part 72 (f) is uniquely determined by 71 (f), this proves 7 (f) = ip ( 77 (f)). □ 

The proof of the preceding lemma shows 
the equivalence of the representations (5.1) and 
(5.2) of manifolds in M n . Let M be given by 
(5.1), and assume that the columns of Q form 
an orthogonal basis of T a M. As in part (a) of 
the proof of Lemma 5.1 the condition g(a + 

Qz + g'(a) T u) = 0 defines locally (close to 
2 = 0 ) a function u(z) which satisfies u( 0 ) = 

0 and 7/(0) = 0. Hence, the manifold M is 
also given by (5.2) with the function 'ip(z) = a + Qz + g'(a) T u(z). 

On the other hand, let M be given by (5.2). Part (b) of the proof of Lemma 5.1 
shows that y = 'ip(z) can be partitioned into 7/1 = 'ipi(z) and 7/2 = ^2(2)* where 
7 pi is a local diffeomorphism. Consequently, M is also given by (5.1) with g(y) = 
y2-ip2(^ 1 (yi)). 

IV.5.2 Differential Equations on Manifolds 

In Sect. IV.4 we introduced differential equations on a manifold as problems satis¬ 
fying (4.2). With the help of Lemma 5.1 we are now in a position to characterize 
such problems without knowledge of the solutions. 

Theorem 5.2. Let Ai be a submanifold ofW 1 . The problem y = f(y) is a differen¬ 
tial equation on the manifold A4 (i.e., it satisfies (4.2)) if and only if 

f(y) G T y M for all y G M. (5.6) 

Proof The necessity of (5.6) follows from the definition of T y M, because the exact 
solution of the differential equation lies in M and has f(y) as derivative. 

To prove the sufficiency, we assume (5.6) and let M be locally, near y 0 , be given 
by a parametrization y = if(z) as in (5.2). We try to write the solution of y = f(y) 9 
7/(0) = 7/0 = 'fi(zo) as y(t) = 7 p(z(t)). If this is at all possible, then z(t) must 
satisfy 
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Ip'{z)z = f(lp(z)) 

which, by assumption (5.6) and the second part of Lemma 5.1, is equivalent to 

i = ip'(z) + f(ip(z)), (5.7) 

where A + = (A T A) -1 A T denotes the pseudo-inverse of a matrix with full column 
rank. Conversely, define z(t ) as the solution of (5.7) with z(0) = zo, which is known 
to exist locally in t by the standard existence and uniqueness theory of ordinary 
differential equations on M m . Then y(t ) = fi>(z(t)) is the solution of y = f(y) with 
?/(0) = 7/o. Hence, the solution y(t) remains in M. □ 

We remark that the sufficiency proof of Theorem 5.2 only requires the function 
f(y) to be defined on M. Due to the equivalence of y = f(y) with (5.7) the prob¬ 
lem is transported to the space of local coordinates. The standard local theory for 
ordinary differential equations on an Euclidean space (existence and uniqueness of 
solutions, ...) can thus be extended in a straightforward way to differential equa¬ 
tions on manifolds, i.e., y = f(y) with / : M —> M n satisfying (5.6). 

IV.5.3 Numerical Integrators on Manifolds 

Whereas the projection methods of Sect. IV.4 require the function f(y) of the differ¬ 
ential equation to be defined in a neighbourhood of M (see Fig. 4.2), the numerical 
methods of this section evaluate f(y) only on the manifold M. The idea is to apply 
the numerical integrator in the parameter space rather than in the space where M is 
embedded. 

Algorithm 5.3 (Local Coordinates Approach). Assume that y n e M and that ip 
is a local parametrization of M. satisfying f>(z n ) = y n . One step y n i—> y n +i Is 
defined as follows (see Fig. 5.1): 

• Compute Sn+i = <Fh(z n )> the result of the method applied to (5.7); 

• define the numerical solution by y n +i = z n +i)- 

It is important to remark that the parametrization y = ip(z) can be changed at every 
step. 



Fig. 5.1. The numerical solution of differential equations on manifolds via local coordinates 
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As indicated at the beginning of Sect. IV.5.1, there are many possible choices 
of local coordinates. Consider the pendulum equation of Example 4.1, where M = 
{{Qi,Q 2 ,Pi,P 2 ) | Qi + = 1? QiPi + Q 2 P 2 = 0}. A standard parametrization here 

is qi = sin a, g 2 = — cos a, pi =00 cos a, and P 2 = w sin a. In the new coordinates 
(a, u ) the problem becomes simply a = u, Cj = — sin a. Other typical choices are 
the exponential map 'ip(Z) = exp (Z) for differential equations on Lie groups, and 
the Cayley transform 'ip(Z) = (I — Z ) _1 (/ + Z ) for quadratic Lie groups. This will 
be studied in more detail in Sect. IV. 8 below. Here we discuss two commonly used 
choices which do not use a special structure of the manifold. 

Generalized Coordinate Partitioning. We assume that the manifold is given by 
(5.1). If g : M n —► M m has a Jacobian with full rank m at y = a, we can find a par¬ 
titioning y = ( 2 / 1 , 2 / 2 )? suc h that dg/dy 2 {a) is invertible. In this case we can choose 
the components of y\ as local coordinates. The function y = ip(z) is then given by 
yx — z and 2/2 = 2 ^ 2 (^)» where ^> 2 ( 2 ) is implicitly defined by = 0. 

This approach has been promoted by Wehage & Haug (1982) in the context of con¬ 
strained mechanical systems, and the partitioning is found by Gaussian elimination 
with full pivoting applied to the matrix g'(a). Another way of finding the partition¬ 
ing is by the use of the QR decomposition with column change. 

Tangent Space Parametrization. Let the manifold M be given by (5.1), and 
collect the vectors of an orthogonal basis of T a M in the matrix Q. We then consider 
the parametrization 

V’aO) = a + Qz + g'(a) T u(z), (5.8) 

where u(z) is defined by gfa a (z)) = 0 , exactly as in the discussion after the proof 
of Lemma 5.1. Differentiating (5.8) yields 

(Q + g'(a) T u\z))z = y = f(y) = f(ip a (z)). 

Since Q T Q = / and g'(a)Q = 0, this relation is equivalent to the differential 
equation 

z = Q T f(*l’a(z)), (5.9) 

which corresponds to (5.7). If we apply a numerical method to (5.9), every function 
evaluation requires the projection of an element of the tangent space onto the mani¬ 
fold. This procedure is illustrated in Fig. 5.1, and was originally proposed by Potra & 
Rheinboldt (1991) for the solution of the Euler-Lagrange equations of constrained 
multibody systems (see also Hairer & Wanner (1996), p. 476). 
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IV.6 Differential Equations on Lie Groups 

Theorem 1.6 and Lemma 3.1 are particu¬ 
lar cases of a more general result which can 
be conveniently formulated with the concept 
of Lie groups and Lie algebras (see Olver 
(1986) and Varadarajan (1974) for an intro¬ 
duction to these subjects). 

A Lie group is a group G which is a dif¬ 
ferentiable manifold, and for which the prod¬ 
uct is a differentiable mapping G x G —► G. 

We restrict our considerations to matrix Lie 
groups , that is, Lie groups which are sub¬ 
groups of GL(n), the group of invertible 
n x n matrices with the usual matrix prod¬ 
uct as the group operation. 

Example 6.1. An important example of a 
Lie group is the group 

O(n) = {y e GL(n) | Y T Y = I} 

of all orthogonal matrices. It is the zero set of g(Y) = Y T Y — I, where we consider 
g as a mapping from the set of all n x n matrices (i.e., M n n ) to the set of all 
symmetric matrices (which can be identified with M n ( n+1 )/ 2 ). The derivative g'(Y) 
is surjective for Y G O(n), because for any symmetric matrix K the choice H = 
YK/ 2 solves the equation g'(Y)H = K. Therefore, the matrix g'(Y) has full rank 
(cf. (5.1)) so that O(n) defines a differentiable manifold of dimension n 2 — n(n + 
l)/2 = n(n — l)/2. The set O(n) is also a group with unit element I (the identity). 
Since the matrix multiplication is a differentiable mapping, O(n) is a Lie group. 

Table 6.1 lists further prominent examples. The matrix J appearing in the defi¬ 
nition of the symplectic group is the matrix determining the symplectic structure on 
M n (see Sect. VI.2). 

As the following lemma shows, the tangent space Q = TjG at the identity I of 
a matrix Lie group G is closed under forming commutators of its elements. This 
makes Q an algebra, the Lie algebra of the Lie group G. 

Lemma 6.2 (Lie Bracket and Lie Algebra). Let G be a matrix Lie group and let 
Q = TjG be the tangent space at the identity. The Lie bracket (or commutator) 

[A, B] = AB - BA (6.1) 

defines an operation which is bilinear, skew-symmetric ([A, B] = 

— [B , A]), and satisfies the Jacobi identity 

[A, [B, C\] + [C, [A, B}] + [B, [C, A]] = 0. ( 6 . 2 ) 

3 Marius Sophus Lie, born: 17 December 1842 in Nordfjordeid (Norway), died: 18 February 
1899. 


Marius Sophus Lie 3 
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Table 6.1. Some matrix Lie groups and their corresponding Lie algebras 


Lie group 

Lie algebra 

GL (n) = {F|detF^0} 

general linear group 

0 [(n) = {A arbitrary matrix} 

Lie algebra of n x n matrices 

SL (n) = {Y|detY = l} 

special linear group 

$l(n) = {A trace(A) = 0} 

special linear Lie algebra 

0 (n) = {Y | Y t Y = 1} 

orthogonal group 

B0(n) — {A \ A t + A = 0} 

skew-symmetric matrices 

SO (n) = {Y € O(n) I detY = 1} 

special orthogonal group 

S0(n) — {A \ A t + A = 0} 

skew-symmetric matrices 

Sp(n) = {Y | Y t JY = J} 
symplectic group 

Sp(n) = {A\JA + A t J = 0} 


Proof. By definition of the tangent space, for A, B G 0 , there exist differentiable 
paths a(t), (3(f) (|t| < e) in G such that a(t) = I+tA(t ) with a continuous function 
A(t) with A(0) = A, and similarly j3(t) — I+tB(t) with 5(0) = B. Now consider 
the path 7 (t) in G defined by 

7 if) = a(Vt)i 8 (Vt)a(^)- 1 i 8 (Vt)^, t > 0 . 

An elementary computation then yields 

7 (£) = I + t[A, 5] + o(£). 

With the extension 7 (£) = 7 (— 1) _1 for negative t, this is a differentiable path in 
G satisfying 7 ( 0 ) = I and 7 ( 0 ) = [ A , 5]. Hence [ A , 5] G 0 by definition of the 
tangent space. The properties of the Lie bracket can be verified in a straightforward 
way. □ 

Example 6.3. Consider again the orthogonal group O(n). Since the derivative of 
g(Y) = Y t Y -I at the identity is g'(I)H = I T H + H T I = H + H T , it follows 
from the first part of Lemma 5.1 that the Lie algebra corresponding to O(n) consists 
of all skew-symmetric matrices. The right column of Table 6 .1 gives the Lie algebras 
of the other Lie groups listed there. 

The following basic lemma shows that the exponential map yields a local para- 
metrization of the Lie group near the identity, with the Lie algebra (a linear space) 
as the parameter space. 
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Lemma 6.4 (Exponential Map). Consider a matrix Lie group G and its Lie alge¬ 
bra 0. The matrix exponential is a map 

exp : 0 —> G, 

i.e. y for A E 0 we have exp (El) E G. Moreover, exp is a local diffeomorphism in a 
neighbourhood of A = 0. 

Proof For A E 0 , it follows from the definition of the tangent space Q = TjG that 
there exists a differentiable path a(t) in G satisfying rt(0) = I and a'(O) = A. For 
a fixed 7eG, the path y(t) := a(t)Y is in G and satisfies 7 ( 0 ) = Y and 7 ( 0 ) = 
AY. Consequently, AY eTyG and Y = AY defines a differential equation on the 
manifold G. The solution Y (t) = exp(tA) is therefore in G for all t. 

Since exp (H) — exp(0) = H + 0(H 2 ), the derivative of the exponential map 
at A = 0 is the identity, and it follows from the inverse function theorem that exp is 
a local diffeomorphism close to A = 0. □ 

The proof of Lemma 6.4 shows that for a matrix Lie group G the tangent space 
at Y E G has the form 

T Y G = {AY | El E 0 }. (6.3) 

By Theorem 5.2, differential equations on a matrix Lie group (considered as a man¬ 
ifold) can therefore be written as 


Y = A(Y)Y (6.4) 

where A(Y) E 0 for all Y E G. The following theorem summarizes this discussion, 
and extends the statements of Theorem 1.6 and Lemma 3.1 to more general matrix 
Lie groups. 

Theorem 6.5. Let G be a matrix Lie group and 0 its Lie algebra. If A(Y) E Qfor 
allY E G and ifYo E G, then the solution of (6.4) satisfies Y(t) E G for all t. □ 

If in addition A(Y) E 0 for all matrices Y, and if 

G = {Y | g(Y) = Const} 

is one of the Lie groups of Table 6.1, then g(Y) is an invariant of the differential 
equation (6.4) in the sense of Definition 1.1. 



IY.7 Methods Based on the Magnus Series Expansion 121 


IV.7 Methods Based on the Magnus Series Expansion 

Before we discuss the numerical solution of 
differential equations (6.4) on Lie groups, let 
us give an explicit formula for the solution of 
linear matrix differential equations 

Y = A(t)Y. (7.1) 

No assumption on the matrix A(t) is made 
for the moment (apart from continuous de¬ 
pendence on t). For the scalar case, the solu¬ 
tion of (7.1) with Y (0) = Yq is given by 

Y(t) = exp( J A(t ) dr) Y 0 - (7.2) 

Also in the case where the matrices A(t) and 
Jq A(t) dr commute, (7.2) is the solution of 
(7.1). In the general non-commutative case 
we follow the approach of Magnus (1954) and we search for a matrix function 17(f) 
such that 

F(f)=exp(C(f))y 0 

solves (7.1). The main ingredient for the solution will be the inverse of the derivative 
of the matrix exponential. It has been studied in Sect. III.4, Lemma III.4.2, and is 
given by 

dexpn\H) = J2 ffad^), (7-3) 

k> 0 

where B are the Bernoulli numbers, and ad q(A) = [17, A] = f2A — Af2 is the 
adjoint operator introduced in (III.4.1). 

Theorem 7.1 (Magnus 1954). The solution of the differential equation (7.1) can be 
written as Y (f) = exp(l7(f)) Yo with 17(f) defined by 

17 = (iexp ^ 1 (A(t )), 17(0) = 0 . (7.4) 

As long as ||i?(t)|| <i r, the convergence of the dex p^ 1 expansion (7.3) is assured. 

Proof. Comparing the derivative of Y ( t ) = exp (!?(£)) Yq, 

Y(t) = (^ 2 ex P^^)) = ( dex Pr 2 (t)(^( i ))) exp(i?(f))Yo) 

with (7.1) we obtain A(t) = dexp^^) (17(f)). Applying the inverse operator 
dexp ^ 1 to this relation yields the differential equation (7.4) for 17(f). The state¬ 
ment on the convergence is a consequence of Lemma III.4.2. □ 

4 Wilhelm Magnus, born: 5 February 1907 in Berlin (Germany), died: 15 October 1990. 
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The first few Bernoulli numbers are Bo = 1, B\ = —1/2, B 2 = 1/6, B% = 0. 
The differential equation (7.4) therefore becomes 

n = m - \ [aA(t)] + T [/2, [12, A(t)]\ +... , 

which is nonlinear in Q. Applying Picard fixed point iteration after integration yields 

f2(t) = f A(t) dr — - f f A(a) dcr, A(r) dr 
J 0 2 J 0 0 

+ j f ( f A(/jl) d/i, A(a) da, A(r) dr (7.5) 
4 Jo [Jo '-Jo -I 

+ T J J A(a) dcr, J A(/i) d/i, A{t) dr + ... , 

which is the so-called Magnus expansion. For smooth matrices A(t) the remain¬ 
der in (7.5) is of size 0(t 5 ) so that the truncated series inserted into Y(t) = 
exp(12(f)) 1 q gives an excellent approximation to the solution of (7.1) for small f. 

Numerical Methods Based on the Magnus Expansion. Iserles & Nprsett (1999) 
study the general form of the Magnus expansion (7.5), and they relate the iterated 
integrals and the rational coefficients in (7.5) to binary trees. For a numerical inte¬ 
gration of 

Y = A(t)Y, Y(to) = Y 0 (7.6) 

(where Y is a matrix or a vector) they propose using Y n + 1 = exp (hf2 n )Y n , where 
hft n is a suitable approximation of Q(h) given by (7.5) with A(t n + r) instead of 
A(r). Of course, the Magnus expansion has to be truncated and the integrals have 
to be approximated by numerical quadrature. 

We follow here the collocation approach suggested by Zanna (1999). The idea 
is to replace A(t) locally by an interpolation polynomial 

s 

A(t) = y>(f) A(t n + Cih), 

and to solve Y = A(t)Y on [t n , t n + h\ by the use of the truncated series (7.5). 

Theorem 7.2. Consider a quadrature formula (bi,Ci)f =1 of order p > s, and let 
Y (f) and Z(t) be solutions ofY = A(t)Y and Z = A(t)Z, respectively, satisfying 
Y (t n ) = Z(t n ). Then, Z(t n + ft) —Y(t n + h) = 0(h p+1 ). 

Proof We write the differential equation for Z as Z = A(t)Z + (A(t) — A(t))Z 
and use the variation of constants formula to get 

Z(t n + ft) - Y(t n + ft) = R(t n + h, r) A(t) - A(r)^j Z{r) dr. 

Applying our quadrature formula to this integral gives zero as result, and the re¬ 
mainder is of size 0(h p+1 ). Details of the proof are as for Theorem II. 1.5. □ 
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Example 7.3. As a first example, we use the midpoint rule (c\ = 1/2, foi = 1). In 
this case the interpolation polynomial is constant, and the method becomes 

Y n+ 1 = exp (hA(t n + h/ 2)) Y n , (7.7) 

which is of order 2 . 

Example 7.4. The two-stage Gauss quadrature is given by ci ? 2 = 1/2 ± x/3/6, 
61 ? 2 = 1/2. The interpolation polynomial is of degree one and we have to apply 
(7.5) in order to get an approximation Y n+ 1 . Since we are interested in a fourth 
order approximation, we can neglect the remainder term (indicated by ... in (7.5)). 
Computing analytically the iterated integrals over products of £i ( t ) we obtain 

Yn+l = eX P^2 - 12 —^ 2 ’ (7-8) 

where A\ = A(£ n + ci/i) and A 2 = A(t n + 02 k). This is a method of order four. 
The terms of (7.5) with triple integrals give 0(h 4 ) expressions, whose leading term 
vanishes by the symmetry of the method (Exercise V.7). Therefore, they need not 
be considered. 

Theorem 7.2 allows us to obtain methods of arbitrarily high order. A straightfor¬ 
ward use of the expansion (7.5) yields an expression with a large number of commu¬ 
tators. Munthe-Kaas & Owren (1999) and Blanes, Casas & Ros (2000a) construct 
higher order methods with a reduced number of commutators. For example, for or¬ 
der 6 the required number of commutators is reduced from 7 to 4. 

Let us remark that all numerical methods of this section are of the form 
Y n +1 = exp (hf2 n )Y n , where Q n is a linear combination of A(t n + Cih) and of 
their commutators. If A(t) G Q for all t , then also hf2 n lies in the Lie algebra 0 , so 
that the numerical solution stays in the Lie group G if To G G (this is a consequence 
of Lemma 6.4). 


IV.8 Lie Group Methods 

Consider a differential equation 

Y = A(Y)Y, F(0)=F 0 (8.1) 

on a matrix Lie group G. This means that Yq G G and that A(Y) G Q for all 
Y G G. Since this is a special case of differential equations on a manifold, projection 
methods (Sect. IV.4) as well as methods based on local coordinates (Sect. IV.5) are 
well suited for their numerical treatment. Here we present further approaches which 
also yield approximations that lie on the manifold. 

All numerical methods of this section can be extended in a straightforward way 
to non-autonomous problems Y = A(t,Y)Y with A(t,Y) G g for all t and all 
FgG. Just to simplify the notation we restrict ourselves to the formulation (8.1). 
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IV.8.1 Crouch-Grossman Methods 

The discipline of Lie-group methods owes a great deal to the pioneering 
work of Peter Crouch and his co-workers ... 

(A. Iserles, H.Z. Munthe-Kaas, S.P. Nprsett & A. Zanna 2000) 

The numerical approximation of explicit Runge-Kutta methods is obtained by a 
composition of the following two basic operations: (i) an evaluation of the vector 
field f(Y ) = A(Y)Y and (ii) a computation of an update of the form Y + haf(Z). 
For example, the left method of (II. 1.3) consists of the following steps: evaluate 
K\ = /(Yo); compute Y\ = To + hK\\ evaluate K 2 = f(Yi)\ compute Yi / 2 = 
Y 0 + | A'-|; compute Y, = Y 1/2 + § K 2 . 

In the context of differential equations on Lie groups, these methods have the 
disadvantage that, even when Y G G and Z E G, the update Y + haA(Z)Z is in 
general not in the Lie group. The idea of Crouch & Grossman (1993) is to replace 
the “update” operation with exip(haA(Z))Y. 

Definition 8.1. Let b t . a LJ (i. j = 1 ,s) be real numbers. An explicit s-stage 
Crouch-Grossman method is given by 

Y« = exp(/ l a i , i _ 1 ^ 1 )-...-exp(/ l a il ^i)Y„, K t = A(Y«), 

Y n+ 1 = exp (hb s K s ) ■ exp(hbiKi)Y n . 

For example, the method of Runge described above (s = 2, <221 = 1, b\ =62 = 
1 / 2 ) leads to 

Y n+1 = exp(^ 2 ) exp(^ 1 )Y„, (8.2) 

where K x = A(Y n ) and K-> = A(exp(hK 1 )Y n ). 

By construction, the methods of Crouch-Grossman give rise to approximations 
Y n which lie exactly on the manifold defined by the Lie group. But what can be said 
about their order of accuracy? 

Theorem 8.2. Let Ci = Yhj a ij • ^ Crouch-Grossman method has order p (p < 3) if 
the following order conditions are satisfied: 


order 1 : 

£ifc = i 

(8.3) 

order 2 .* 

biCi = 1/2 

(8.4) 

order 3: 

Ei bitf = 1/3 

(8.5) 


E ij ^ i a ij c j = 1/6 

(8.6) 


E i bi°i + 2 E i<j bi°ibj = 1/3. 

(8.7) 


Proof. As in the case of Runge-Kutta methods, the order conditions can be found 
by comparing the Taylor series expansions of the exact and the numerical solution. 
In addition to the conditions stated in the theorem, this leads to relations such as 
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Y. h2 iC i + 2 Y.bibjCj = ( 8 . 8 ) 

i i<j 

Adding this equation to (8.7) we find 2 Yhij biCibj = 1, which is satisfied by (8.3) 
and (8.4). Hence, the relation (8.8) is already a consequence of the conditions stated 
in the theorem. □ 


Table 8.1. Crouch-Grossman methods of order 3 


0 


0 



-1/24 

-1/24 

3/4 

3/4 


17/24 

161/24 -6 

17/24 

119/216 

17/108 


1 -2/3 2/3 

13/51 

-2/3 24/17 


Crouch & Grossman (1993) present several solutions of the system (8.3)-(8.7), 
one of which is given in the left array of Table 8.1. The construction of higher order 
Crouch-Grossman methods is very complicated (“... any attempt to analyze algo¬ 
rithms of order greater than three will be very complex, ...”, Crouch & Grossman, 
1993). 

The theory of order conditions for Runge-Kutta methods (Sect. III. 1) has been 
extended to Crouch-Grossman methods by Owren & Marthinsen (1999). It turns out 
that the order conditions for classical Runge-Kutta methods form a subset of those 
for Crouch-Grossman methods. The first new condition is (8.7). For a method of 
order 4, thirteen conditions (including those of Theorem 8.2) have to be satisfied. 
Solving these equations, Owren & Marthinsen (1999) construct a 4th order method 
with 5 = 5 stages. 


IV.8.2 Munthe-Kaas Methods 

These methods were developed in a series of papers by Munthe-Kaas (1995, 1998, 
1999). The main motivation behind the first of these papers was to develop a the¬ 
ory of Runge-Kutta methods in a coordinate-free framework. After attempts that 
led to new order conditions (as for the Crouch-Grossman methods), Munthe-Kaas 
(1999) had the idea to write the solution as Y(t) = exp(i?(£))Yo and to solve 
numerically the differential equation for Q(t). It sounds awkward to replace the 
differential equation (8.1) by a more complicated one. However, the nonlinear in¬ 
variants g(Y ) = 0 of (8.1) defining the Lie group are replaced with linear invariants 
= 0 defining the Lie algebra, and we know from Sect. IV. 1 that essentially 
all numerical methods automatically conserve linear invariants. 

It follows from the proof of Theorem 7.1 that the solution of (8.1) can be written 
as Y(t) = exp(i?(£))lo> where !?(£) is the solution of i? = dexp^ 1 (A(V(t))), 
17(0) = 0. Since it is not practical to work with the operator <iexp^ 1 , we truncate 
the series (7.3) suitably and consider the differential equation 
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ti = A(exp(J?)lo) + ^2 JT ad^A(exp(J?)yo )) 5 /2(0) = 0. (8.9) 

k=1 K ' 

This leads to the following method. 

Algorithm 8.3 (Munthe-Kaas 1999). Consider the problem (8.1) with AfY ) E 0 
for Y E G. Assume that Y n lies in the Lie group G. Then, the step Y n Y n + 1 is 

defined as follows: 

• consider the differential equation (8.9) with Y n instead ofY$, and apply a Runge- 
Kutta method (explicit or implicit) to get an approximation Q\ ~ fi(h), 

• then define the numerical solution by Y n + 1 = exp(i?i)K n . 

Before analyzing this algorithm, we emphasize its close relationship with Algo¬ 
rithm 5.3. In fact, if we identify the Lie algebra 0 with M. k (where k is the dimension 
of the vector space 0 ), the mapping f>(L2) = exp (f2)Y n is a local parametrization 
of the Lie group G (see Lemma 6.4). Apart from the truncation of the series in (8.9), 
Algorithm 8.3 is a special case of Algorithm 5.3. 

Important properties of the Munthe-Kaas methods are given in the next two 
theorems. 

Theorem 8.4. Let G be a matrix Lie group and 0 its Lie algebra. If AfY) E 0 
for Y E G and if Yq E G, then the numerical solution of the Lie group method of 
Algorithm 8.3 lies in G, i.e., Y n E G for all n = 0,1, 2,... . 

Proof. It is sufficient to prove that for Yq E G the numerical solution of the 
Runge-Kutta method applied to (8.9) lies in 0 . Since the Lie bracket [!2, A] is an 
operation 0 x and since exp(i?)Yo E G for Q E 0 , the right-hand expres¬ 

sion of (8.9) is in 0 for Q E 0 . Hence, (8.9) is a differential equation on the vector 
space 0 with solution Qft) E 0 . All operations in a Runge-Kutta method give 
results in 0 , so that the numerical approximation Q\ also lies in 0 . □ 

Theorem 8.5. If the Runge-Kutta method is of (classical) order p and if the trun¬ 
cation index in (8.9) satisfies q > p — 2, then the method of Algorithm 8.3 is of 
order p. 

Proof. For sufficiently smooth AfY) we have Lift) = tAfYo) + Oft 2 ), Yft) = 
Yq + Oft) and [f2 ft), AfY ft))] = Oft 2 ). This implies that ad k Q ^(A(Y(t))) = 
Oft k+1 ), so that the truncation of the series in (8.9) induces an error of size Ofh q+ 2 ) 
for 111 < h. Hence, for q + 2 > p, this truncation does not affect the order of 
convergence. □ 

The most simple Lie group method is obtained if we take the explicit Euler 
method as basic discretization and q = 0 in (8.9). This leads to the so-called Lie- 
Euler method 

Y n+1 = e*v(hA{Y n ))Y n . (8.10) 

This is also a special case of the Crouch-Grossman methods of Definition 8.1. 
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Taking the implicit midpoint rule as the basic discretization and again q = 0 in 
(8.9), we obtain the Lie midpoint rule 

Y n+ 1 = exp (f2)Y n , Q = hA(ex.p(f2/2)Y n ). (8.11) 

This is an implicit equation in i? and has to be solved by fixed point iteration or by 
Newton-type methods. 

Example 8.6. We take the coefficients of the right array of Table 8.1. They give rise 
to 3rd order Munthe-Kaas and 3rd order Crouch-Grossman methods. We apply both 
methods with the large step size h = 0.35 to the system (1.5) which is already of the 
form (8.1). Observe that Yq is a vector in M 3 and not a matrix, but all results of this 
section remain valid for this case. For the computation of the matrix exponential we 
use the Rodrigues formula (Exercise 17). The numerical results (first 1000 steps) are 
shown in Fig. 8.1. We see that the numerical solution stays on the manifold (sphere), 
but on the sphere the qualitative behaviour is not correct. A similar behaviour could 
be observed for projection methods (the orthogonal projection consists simply in 
dividing the approximation Y n + 1 by its norm) and by the methods based on local 
coordinates. 

Crouch-Grossman methods and Munthe-Kaas methods are very similar. If they 
are based on the same set of Runge-Kutta coefficients, both methods use s evalu¬ 
ations of the matrix A(Y). The Crouch-Grossman methods require in general the 
computation of s(s + l)/2 matrix exponentials, whereas the Munthe-Kaas meth¬ 
ods require only s of them. On the other hand, Munthe-Kaas methods need also the 
computations of a certain number of commutators which increases with q in (8.9). 
In such a comparison one has to take into account that every classical Runge-Kutta 
method defines a Munthe-Kaas method of the same order, but Crouch-Grossman 
methods of high order are very difficult to obtain, and need more stages for the 
same order (if p > 4). 



Fig. 8.1. Solutions of the Euler equations (1.4) for the rigid body 
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IV.8.3 Further Coordinate Mappings 

The methods of Algorithm 8.3 are based on the local parametrization 'ip(fi) = 
exp (Q)Y n . For all Lie groups, this is a diffeomorphism between the Lie group and 
the corresponding Lie algebra. Are there other, computationally more efficient para- 
metrizations that can be used in special situations? 

The Cayley Transform. Lie groups of the form 

G = {Y | Y t PY = P}, (8.12) 

where P is a given constant matrix, are called quadratic Lie groups. The corre¬ 
sponding Lie algebra is given by Q = {!? | PQ + fi T P = 0}. The orthogonal 
group O(n) and the symplectic group Sp(n) are prominent special cases (see Ta¬ 
ble 6.1). For such groups we have the following analogue of Lemma 6.4. 

Lemma 8.7. For a quadratic Lie group G, the Cayley transform 

cay 17 = (I - n)- 1 ^ + fi) 

maps elements of Q into G. Moreover, it is a local diffeomorphism near Q = 0. 

Proof For Q E 0 (i.e., Pfi + fi T P = 0) we have P(I + !?) = (/ — Q) T P and 
also P(I - Q)- 1 = (IP Q)~ T P. For Y = (/ - i?)-^/ + Q) this immediately 
implies Y T PY — P. □ 

The use of the Cayley transform for the numerical integration of differential 
equations on Lie groups has been proposed by Lewis & Simo (1994) and Diele, 
Lopez & Peluso (1998) for the orthogonal group, and by Lopez & Politi (2001) for 
general quadratic groups. It is based on the following result, which is an adaptation 
of Lemma III.4.1 and Lemma III.4.2 to the Cayley transform. 

Lemma 8.8. The derivative of cay Q is given by 

(jfj cay f^jH = (dcay cay/2, 

where 

dcay n (H) = 2(1 - Q)- X H(I + i?) -1 . (8.13) 

For the inverse of d cay q we have 

dcay y (H) = ^(I - (2)H(I + fl). (8.14) 

Proof By the usual rules of calculus we obtain 

(A cay nj H = (/ _ - n)- 1 ^ + n) + (i - 


and a simple algebraic manipulation proves the statements. 


□ 



IV. 8 Lie Group Methods 129 


The numerical approach for solving (8.1) in the case of quadratic Lie groups 
is an adaptation of the Algorithm 8.3. We consider the local parametrization Y = 
i/j(ft) = cay (ft)Y n , and we apply one step of a numerical method to the differential 
equation ft = <icay cay (ft)Y n ) which, by (8.14), is equivalent to 

Q = !(/- J?)A(cay(J?)y n )(/ + J?). 

This equation replaces (8.9) in the Algorithm 8.3. Since no truncation of an infinite 
series is necessary here, this approach is a special case of Algorithm 5.3. 

Canonical Coordinates of the Second Kind. For a basis {Ci, C 2 ,..., C d } of 
the Lie algebra Q the coordinates z\,...,z d of the local parametrization fi>(z) = 
ex P(SiLi z iC{) of the Lie group G are called canonical coordinates of the first 
kind. Here we are interested in the parametrization 


V’O) = exp(ziCi) exp (z 2 C 2 ) • • • • • exp (z d C d ), (8.15) 

and we call z = (zi,..., z d ) T canonical coordinates of the second kind (Varadara- 
jan 1974). The use of these coordinates in connection with the numerical solution 
of differential equations on Lie groups has been promoted by Celledoni & Iserles 
(2001) and Owren & Marthinsen (2001). The idea behind this choice is that, due to 
a sparse structure of the Ci, the computation of exp(2 qCi),..., exp (z d C d ) may be 
much cheaper than the computation of exp(JA ZiCf). 

With the change of coordinates y = 'ip(z), the differential equation (8.1) be¬ 
comes fi>'(z)z = A((fi>(z))fi>(z), which is equivalent to 


d 

A(4>( z )) = Y Zi exp(ziCi) • ... • exjp(zi-iCi-i) 

• Ci • exp(-^_iQ_i) • ... • exp(—2qCi) ( 8 . 16 ) 

d 

= Y^i{ F 1 0 • • • 0 

i=l 


where we use the notation FjC = exp (zjCj) C exp (—ZjCj) for the linear operator 
Fj : 0 —► 0; see Exercise 12. We need to compute iq,. *. , z d from (8.16), and this 
will usually be a computationally expensive task. However, for several Lie algebras 
and for well chosen bases this can be done very efficiently. The crucial idea is the 
following: we let Fj be defined by 


FjCi = 


FjCi 

Ci 


if i > j 

if i < j, 


(8.17) 


and we assume that 

(Fl O ... o Fi^)Ci = (Fi o ... o 


i = 2 ,,.. ,d. 


(8.18) 
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Under this assumption, we have (Fi o ... o Fi-\)Ci = (Fi o ... o Fi-\)Ci = 
[F\ o ... o Fd~i)Ci, and the relation (8.16) becomes 

d 

(F 1 o...oi? d _ 1 )(]Ti i C i ) = A(il>(z)). (8.19) 

i=l 

In the situations which we have in mind, the operators Fj can be efficiently inverted, 
and Algorithm 5.3 can be applied to the solution of (8.1). 

The main difficulty of using this coordinate transform is to find a suitable or¬ 
dering of a basis such that condition (8.18) is satisfied. The following lemma sim¬ 
plifies this task. We use the notation ak(C) for the coefficient in the representation 

c = ELiMC)c k . 

Lemma 8.9. Let {Ci,..., Cd} be a basis of the Lie algebra Q. If for every pair 
j < i and for k < j we have 


ak(FjCi) -=f 0 ==> FiCk = Ck for t satisfying k < I < j, ( 8 . 20 ) 

then the relation (8.18) holds for all i = 2 ,..., d. 

Proof. We write i^_iQ = = J2k a k(Fi~iCi)Ck- It follows from the 

definition of Fj and from (8.20) that (i^_ 2 oi^_i)C| = (F ^_ 2 oF^_ i)Ci. A repeated 
application of this argument proves the statement. □ 

Owren & Marthinsen (2001) have studied Lie algebras that admit a basis satis¬ 
fying (8.18) for all z. We present here one of their examples. 

Example 8.10 (Special Linear Group). Consider the differential equation (8.1) 
on the Lie group SL(n) = {Y | det Y = 1}, i.e., the matrix A(Y) lies in J$l(n) = 
{A | traceA = 0}. As a basis of the Lie algebra 5l(n) we choose F^ = e^ej for 
i / j, and Di = e^ej — ei+iej^ for 1 < i < n (here, = ( 0 ,.... 1 ...., 0 ) T 
denotes the vector whose only non-zero element is in the ith position). Following 
Owren & Marthinsen (2001) we order the elements of this basis as 

F 12 E\ n <C F 2 3 ... <C F 2n F n _i ?n 

< F 2 i < ... < E nl < F 32 < ... < E n2 < ... < F n?n _i 
< D x < ... < D n _ i. 

With the use of Lemma 8.9 one can check in a straightforward way that the relation 

(8.18) is satisfied. In nearly all situations ak(FjCi) = 0 for k < j < i, so that 

(8.18) represents an empty condition. Consequently, the i z can be computed from 

(8.19) . Due to the sparsity of the matrices Eij and Di , the computation of Ff 1 can 
be done very efficiently. 
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IV.9 Geometric Numerical Integration Meets 
Geometric Numerical Linear Algebra 

The persistent use of orthogonal transformations is a hallmark of numerical linear 
algebra. Correspondingly, manifolds incorporating orthogonality constraints play 
an important role all over this field; see Edelman, Arias & Smith (1998) on the 
geometry of algorithms with orthogonality constraints. In addition to the orthogonal 
group O(n), the manifolds of primary interest are: 

V n ,k, the Stiefel manifold of n x k matrices with k orthonormal columns, 

G n ,k, the Grassmann manifold of orthogonal projections of M n onto ^-dimensional 
subspaces, and 

M.™ xn , the manifold ofmxn matrices of rank fc, which is related to orthogonal 
transformations via the singular value decomposition and a related decomposi¬ 
tion discussed below. 


IV.9.1 Numerical Integration on the Stiefel Manifold 

The original motivation for Stiefel Manifolds 
(in Stiefel 1935) was the topological problem, 
whether a manifold M. can possess k everywhere 
linearly independent continuous vector fields. The 
problem, which had been solved for the case k = 

1, was much harder for k > 1. In order to attack 
this question, Stiefel introduced ‘his’ manifold 

V n ,k = {Y eR nxk \Y T Y = 1}, (9.1) 

as an auxiliary tool for the definition of what later 
became known as the Stiefel-Whitney classes 5 6 . 

Here, we are interested in computations on 
these manifolds for their own, with many appli¬ 
cations, as for example the computation of Lya¬ 
punov exponents of differential equations; see Ex¬ 
ercise 22 as well as Bridges & Reich (2001) and Died, Russell & van Vleck (1997). 
There are also many cases where orthogonality constraints concern only some of 
the variables in a differential equation. In molecular dynamics, for example, such 
orthogonality constraints arise in the Car-Parrinello approach to ab initio molecu¬ 
lar dynamics (Car & Parrinello 1985) and in the multiconfiguration time-dependent 
Hartree method of quantum molecular dynamics (Beck, Jackie, Worth & Meyer 
2000 ). 

5 Eduard L. Stiefel, born: 21 April 1909 in Ziirich, died: 25 November 1979; photo: Bil- 
darchiv ETH-Bibliothek, Zurich. 

6 We are grateful to our colleague A. Haefliger for this indication. 



Eduard Stiefel 5 



132 IV. Conservation of First Integrals and Methods on Manifolds 


Tangent and Normal Space. We choose a fixed matrix F in the Stiefel manifold 
V = V n ,fc- Then the tangent space (5.4) at Y G V consists of the matrices Z such 
that (F + eZ) T (F + eZ) remains I for e —> 0. Differentiating we obtain 

TyV = {Z e R nxk | Z T Y + Y T Z = 0}, (9.2) 

i.e., Y T Z is skew-symmetric. This represents \k(k + 1) conditions, thus TyV is of 
dimension nk — \k{k + 1). 

For defining the normal space, we use the standard Euclidean inner product on 
R nxk , i.e., 

(A B) = trac e(A T B) = J2ij a ijbij , (9.3) 

whose corresponding norm is the Frobenius norm 

\\A\f = \fj2ij a ij- (9-4) 

Then the normal space at F is given by 

Ny V = {K e R nxk | K _L TyV} = {VS | 5 symmetric kxk matrix}. (9.5) 

To show this, we observe that the orthogonality YS _L Ty V follows from (F5, Z) = 
trace(5F T Z) = (5, F T Z) and the fact that any symmetric matrix ^4 is orthogonal 
to any skew-symmetric matrix B. 1 A dimension count (the matrix S has \k{k + 1) 
free elements) now shows us that the space defined in (9.5) fills the entire orthogonal 
complement of TyV. 

Orthogonality-Preserving Runge-Kutta Methods. Suppose now that we have to 
solve a differential equation F = F(Y) on a Stiefel manifold V. The orthogonality 
constraints F T F = I are preserved, if the derivative F(Y ) lies in the tangent space 
TyV, i.e., if F(Y) T Y + Y T F(Y) = 0, for every F G V (weak invariants, see 
Sect. IV.4). In the (exceptional) case where they are in fact true invariants, i.e., if 
F{Y) T Y + Y T F(Y) = 0 for all Y G R nxk , then the orthogonality constraints are 
quadratic, and are therefore preserved exactly by the implicit Runge-Kutta meth¬ 
ods of Sect. IV.2.1, in particular the Gauss methods. These methods give numerical 
solutions on the Stiefel manifold, but use function evaluations outside the manifold. 

In the general case of only weak invariants, a standard approach for enforcing 
orthogonality is the introduction of Lagrange multipliers , which can be interpreted 
as artificial forces in the direction of the normal space keeping the solutions on the 
manifold. Due to the structure of Ny V (see (9.5)), the problem becomes here 

Y = F(Y) + Y A, Y t Y = I (9.6) 

with a symmetric Lagrange multiplier matrix A £ M. kxk ; see also Exercise 10. 
Any numerical method for differential-algebraic equations can now be applied, e.g., 

7 Indeed, split the sum in (9.3) in two parts i < j and i > j, and interchange i <-> j in the 
second sum. Then both sums are identical with opposite sign. 
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appropriate Runge-Kutta methods as in Chap. VI and Sect. VII.4 of Hairer & Wan¬ 
ner (1996). A symmetric adaptation of Gauss methods to such problems is given by 
Jay (2005). 

Below we shall study in great detail mechanical systems with constraints (see 
Sect. VII. 1). In the case of orthogonality constraints, such problems can be treated 
successfully with Lobatto IIIA-IIIB partitioned Runge-Kutta methods, which in ad¬ 
dition to orthogonality preserve other important geometric properties such as re¬ 
versibility and symplecticity. 




Fig. 9.1. Projection onto the Stiefel manifold using the singular value decomposition 


Projection Methods. If we want to use the projection method of Algorithm 4.2, we 
have to perform, after every integration step, the projection (4.4), which requires to 
find for any given matrix Y a matrix Y G V with 

\\Y- Y\\ f ^min. (9.7) 

This projection can be obtained as follows: if Y is not in V (but close), then its 
column vectors y\ ,..., yk will have norms different from 1 and/or their angles 
will not be right angles. These quantities determine an ellipsoid, if we require that 
these vectors represent conjugate diameters 8 (see Fig. 9.1 (a)). This ellipsoid is then 
transformed to principal axes in R k by an orthogonal map U T (picture (b)). We let 
<7 i,..., be the length of these axes. If the coordinates are now divided by 
then the ellipsoid becomes the unit sphere and the vectors U T yi become orthonor¬ 
mal vectors U T yi. These vectors, when transformed back with U, lie in V and are 
the projection we were searching for (picture (c)). For a proof of the optimality, see 
Exercise 21. 

Connection with the Singular Value Decomposition. We have by construction that 
U T yi = U^U T yi where Y = diag(oi,... cr&). If we finally map these vectors by 
an orthogonal matrix V to the unit base, we see that V Y~ 1 U T Y = /, or 

Y = UYV T (9.8) 

which is the singular value decomposition of Y . This connection allows us to use 
standard software for our calculations. The projected matrix is then Y = UV T . 

8 Here we touch another of Stiefel’s great ideas, the CG algorithm. 
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Remark 1. When the differential equation possesses some symmetry (see the next 
chapter), then the symmetric projection algorithm V.4.1 is preferable to be used 
instead. 

Remark 2. The above procedure is equivalent to the one proposed by D. Higham 
(1997): the orthogonal projection is the first factor of the polar decomposition Y = 
YR (where Y has orthonormal columns and R is symmetric positive definite). The 
equivalence is seen from the polar decomposition Y = (UV T )(V ZV T ). A related 
procedure, where the first factor of the QR decomposition of Y is used instead of 
that of the polar decomposition, is proposed in Dieci, Russell & van Vleck (1994). 

Tangent Space Parametrization. For the appli¬ 
cation of the methods of Sect. IV.5, in particular 
Subsection IV.5.3, to the case of Stiefel mani¬ 
folds, we have to find the formulas for the pro¬ 
jection (5.8) (see the wrap figure). 

For a fixed Y, let Y+Z be an arbitrary matrix 
in Y + Ty V, for which we search the projection 
^y(Z) to V. Because of the structure of NyV 
(see (9.5)), we have that 

^y{Z) =Y + Z + YS (9.9) 

is a local parametrization of V, if S is symmetric and if ^y (Z) T ipY (Z) = I. This 
condition, when multiplied out, shows that S has to be a solution of the algebraic 
Riccati equation 



S 2 + 2S + SY T Z + Z T YS + Z T Z = 0. (9.10) 

Observe that for k = 1, where the Stiefel manifold reduces to the unit sphere in 
M n , the equation (9.10) is a scalar quadratic equation and can be easily solved. For 
k > 1 , it can be solved iteratively using the scheme (e.g., starting with So = 0 ) 

(/ + Z T Y)S n + S n (I + Y t Z ) = -Z T Z - S 2 ^. 

Using a Schur decomposition Y T Z = Q T RQ (where Q is orthogonal and R upper 
triangular), the elements of QS n Q T can be computed successively starting from the 
left upper corner. We refer to the monograph of Mehrmann (1991) for a detailed 
discussion of the solution of linear and algebraic Riccati equations. 

Next, we compute for the matrix F its orthogonal projection Py{F ) to TyV, 
i.e., by (9.5), we have to find a symmetric matrix S such that Py(F) = F — YS. 
The tangent condition P Y ( F) T Y + Y T P Y (F) = 0 leads to S = ( F T Y + Y T F) /2, 
so that 

Py(F)=F-^ (YF t Y + Y Y t F) . (9.11) 

With the parametrization ^y(Z) of (9.9) the transformed differential equation, 
when projected to the tangent space, yields 

Z = P y F^ y (Z)), 


(9.12) 
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in complete analogy to (5.9). The numerical solution of (9.12) requires, for every 
function evaluation, the solution of the Riccati equation (9.10) and the computation 
of a projection onto the tangent space, each needing 0(nk 2 ) operations. Compared 
with the projection method, the overhead (i.e., the computation apart from the evalu¬ 
ation of F(Y )) is more expensive, but the approach described here has the advantage 
that all evaluations of F are exactly on the manifold V. 


IV.9.2 Differential Equations on the Grassmann Manifold 

The Grassmann manifold is obtained from the Stiefel manifold by identifying ma¬ 
trices in V n ,k that span the same subspace (see Fig. 9.2 (a)). Since any two such 
matrices result from each other by right multiplication with an orthogonal k x k 
matrix, the resulting manifold is the quotient manifold 

Gn,k=V n , k /0{k). (9.13) 

An equivalence class [Y] E G n ,k defines an orthogonal projection P = YY T of 
rank fc, and conversely, every orthogonal basis of the range of P yields a represen¬ 
tative Y E V n ,k- We can thus view the Grassmann manifold as 

r _ J I P is an orthogonal projection onto 1 

n,k l I a ^-dimensional subspace of W 1 )' ( ^) 



Fig. 9.2. Integration of a differential equation on the Grassmann manifold 


The Tangent Space. The map Y P = YY T from V —> Q has the tangent map 
(derivative) 9 

T y V TpQ : SY^SP = SYY T + YSY T , (9.15) 

and we wish to apply all the methods for TyV from the arsenal of the preceding 
section to problems in TpQ. However, the dimension of TpQ is by \k(k — 1) lower 
than the dimension of TyV. This difference is the dimension of O(fc) and also of 

9 Here we write SY for tangent matrices at Y (what has been Z in (9.2)), and similarly 
for other matrices; Lagrange’s 5-notation here becomes preferable, since we will have, 
especially in the next subsection, more and more matrices moving around. 
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B0(k), the vector space of skew-symmetric k x k matrices. The key idea is now 
the following: if we replace the condition from (9.2), Y T SY skew-symmetric, by 
Y t SY = 0, then we remove precisely the superfluous degrees of freedom. Indeed, 
the extended tangent map 

TyV -*• T P g x so(k) : 5Y ^ ( 5YY T + YSY T , Y T 5Y) (9.16) 

is an isomorphism, since it is readily seen to have zero null-space and the dimensions 
of the vector spaces agree. The tangent space is thus characterized as 

TpC, = {SP = SYY t + YSY t | Y T SY = 0}, (9.17) 

and every SP G TpQ corresponds to a unique SY with Y T SY = 0. Note that this 
condition on SY does not depend on the representative Y of [Y ]. 

Differential Equations. Consider now a differential equation on Q, 

P = G(P), (9.18) 

with a vector field G on Q. The condition G(P) G TpQ means, since the tangent 
map (9.15) is onto, that there exists for P = YY T a vector F(Y) such that 

G(P) = F{Y)Y t + YF(Y) t with F t Y + Y t F = 0 (9.19) 

i.e., F(Y) G TyV. However, from a given initial position Y, there are many F 
which produce the same movement G of the subspace represented by P (see Fig. 9.2 
(b)). By (9.16), the movement of Y becomes unique if we require that this movement 
is orthogonal to the subspace (see Fig. 9.2 (c)), 

Y t Y = 0 . (9.20) 

Multiplying the derivative P = YY T + YY T with Y T from the left, we obtain, 
under condition (9.20), Y T P = Y T and, by (9.18) and (9.19), Y = YF T Y + F or 

Y = (I — YY t )F(Y). (9.21) 

Geometrically, this means that the vector F(Y), which could be chosen arbitrarily 
in TyV , is projected to the orthogonal complement of the subspace spanned by Y or 
P = YY t . The derivative Y in (9.21) is independent of the particular choice of F. 

Equation (9.21) is a differential equation on the Stiefel manifold V that can be 
solved numerically by the methods described in the previous subsection. 

Example 9.1 (Oja Flow). A basic example arises in neural networks (Oja 1989): 
solutions on V n ,/c of the differential equation 

Y= (I- YY t )AY (9.22) 

with a constant symmetric positive definite matrix A G R nxn tend to an orthogonal 
basis of an invariant subspace of A as t —> oo (Yan, Helmke & Moore 1994). 
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A naive comparison of this equation with (9.21) would lead to F(Y) = AY, but 
this function does not satisfy the tangent condition F T Y + Y T F = 0 from (9.19). 
So we use the fact that (/ - YY T f =1- YY T and set F(Y) = (/ - YY T )AY. 
With this, G(P) from (9.18) and (9.19) becomes 

P = (I - P)AP + PA(I - P ). (9.23) 

We have obtained the result that equation (9.22) can be viewed as a differential 
equation on the Grassmann manifold G n ,k- 

However, for the numerical integration it is more practical to work with (9.22). 


IV.9.3 Dynamical Low-Rank Approximation 

Low-rank approximation of large matrices is a basic model reduction technique in 
many application areas, such as image compression and latent semantic indexing in 
information retrieval; see for example Simon & Zha (2000). Here, we consider the 
task of computing low rank approximations to matrices A(t) G M mxn depending 
smoothly on t. At any time t, a best approximation to A(t) of rank k is a matrix 
X(t) in the manifold Mu = M™ xn of rank -k matrices that satisfies 

X(t) G Mk such that || X(t) — A(t)\\p = min! (9.24) 

The problem is solved by a singular value decomposition of A(t), truncating all 
singular values after the r largest ones. When the matrix is so large that a complete 
singular value decomposition is not feasible, a standard approach to obtain an ap¬ 
proximate solution is based on the Lanczos bidiagonalization process with A(t), as 
discussed in Simon & Zha (2000). 

Following Koch & Lubich (2005), we here consider instead the low-rank ap¬ 
proximation Y(t) G Mk determined from the condition that for every t the deriva¬ 
tive Y(t), which is in the tangent space T Y (t)Mk , be chosen as 

Y(t) G Ty^Aik such that || Y(t) — A(t)\\F = min! (9.25) 

This is complemented with an initial condition, ideally Y(to) = X(to). For given 
Y (■ t ), the derivative Y ( t) is obtained by a linear projection, though onto a solution- 
dependent vector space. Problem (9.25) yields a differential equation on Mk- We 
will see that with a suitable factorization of rank -k matrices, we obtain a system 
of differential equations for the factors that is well-suited for numerical integration. 
The differential equations contain only the increments A(t), which may be much 
sparser than the full data matrix A(t). 

Koch & Lubich (2005) show that Y(t) yields a quasi-optimal approximation 
on intervals where a good smooth approximation exists. It must be noted, however, 
that the best rank -k approximation X(t) may have discontinuities, which cannot 
be captured in Y(t). This is already seen from the example of finding a rank-1 
approximation to diag(e _t , e t ), where starting from to < 0 yields X(t) = Y(t) = 
diag(e _t , 0) for t < 0 , but Y ( t ) = diag(e _t , 0) and X(t) = diag(0, e f ) for t > 0 . 
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The best approximation X (t) has a discontinuity at t = 0, caused by a crossing of 
singular values of which one is inside and the other one outside the approximation. 
An algorithmic remedy is to restart (9.25) at regular intervals. 

In contrast to (9.24), the approach (9.25) extends immediately to the low-rank 
approximation of solutions of matrix differential equations A = F(A). Here, 
A(t) in (9.25) is simply replaced by the approximation F(Y (£)), which yields the 
minimum-defect low-rank approximation Y (t) by choosing 

YeT Y M k such that \\Y - F(Y)\\ f = min! (9.26) 

An approach of this type is of common use in quantum dynamics, where the phys¬ 
ical model reduction of the multivariate Schrodinger equation by the analogue of 
(9.26) is known as the Dirac-Frenkel time-dependent variational principle, after 
Dirac (1930) and Frenkel (1934); see also Beck, Jackie, Worth & Meyer (2000) 
and Sect. VII. 6 . 

Decompositions of Rank-fc Matrices and of Their Tangent Matrices. Every real 
rank-/c matrix of dimension m x n can be written in the form 

Y = USV T (9.27) 

where U £ V m j,- and V £ V n ,k have orthonormal columns, and S £ R k/k is 
nonsingular. The singular value decomposition yields S diagonal, but here we do 
not assume a special form of S. The representation (9.27) is not unique: replacing 
U by V = UP and V by V = VQ with orthogonal matrices P,Q G 0(k) and 
correspondingly S by S = P T SQ , yields the same matrix Y — USV T = VSV T . 

As a substitute for the non-uniqueness in (9.27), we use - as in the previous 
subsection - a unique decomposition in the tangent space. Every tangent matrix 
SY G TyMk at Y = USV T is of the form (see Exercise 23) 

SY = SUSV T + USSV T + USSV T , (9.28) 

where SS G R kxk and SU G 7bV m , fe , ^ G T v V n , k - Conversely, 5S,5U,5V are 
uniquely determined by SY if we impose the orthogonality constraints 

U t 5U = 0, V t 5V = 0. (9.29) 

Equations (9.28) and (9.29) yield 

ss = u t syv ; 

SU = (I- UU t )SYVS~\ (9.30) 

SV = (I - VV t )SY t US~ t . 

Formulas (9.28) and (9.30) establish an isomorphism between the subspace 

{(SS, SU , SV) G R kxk x R mxk x R nxk \ U T SU = 0, V T SV = 0} 


and the tangent space TyMk • 
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Differential Equations for the Factors. The minimization condition (9.25) is 
equivalent to the orthogonal projection of A(t) onto the tangent space T Y (t)Mk- 
find Y eTyM.k (we omit the argument t) satisfying 

(Y - A, SY) = 0 for all SY G T v M k , (9.31) 

with the Frobenius inner product (A, B) = trace(GL T F?). With this formulation we 
derive differential equations for the factors in the representation (9.27). 

Theorem 9.2. For Y — USV T G Ai k with nonsingular S G M. kxk and with 
U G W nxk and V G W ixk having orthonormal columns, condition (9.25) or (9.31) 
is equivalent to Y A USV T + USV T + USV T , where 

S = u T Av 

U = (/ -UU T )AVS ~ 1 (9.32) 

V = (/ -VV T )A T US~ T . 

Proof. For u G M m , v G M n and B G M mxn , we use the identity 

(uv T , B) = u T Bv. 

In view of (9.29) we require U T U = V T V = 0 along the solution trajectory in order 
to define a unique representation of Y. We first substitute 5Y = uivj, for — 
1,..., k, in (9.31), where Ui, Vj denote the columns of [/, V, respectively. This is of 
the form (9.27) with 5U = SV = 0 and one non-zero element in SS. In this way we 
find S = U T Av. Similarly, choosing 5Y = i Susijvj, i = 1,..., k, where 
5u G M m is arbitrary with U T 5u = 0, we obtain the stated differential equation 
for U, and likewise for 5Y = Y^ J j=i u j s ji^v T with V T 5v = 0 the differential 
equation for V. □ 

The differential equations (9.32) are closely related to differential equations for 
other smooth matrix decompositions, in particular the smooth singular value decom¬ 
position; see, e.g., Dieci & Eirola (1999) and Wright (1992). Unlike the differential 
equations for singular values given there, the equations (9.32) have no singularities 
at points where singular values of Y (t) coalesce. 

For the minimum-defect low-rank approximation (9.26) of a matrix differential 
equation A = F(A), we just need to replace A by F(Y) for Y = USV T in the 
differential equations (9.32). 

The matrices U ( t ) and V ( t ) evolve on Stiefel manifolds. The differential equa¬ 
tions (9.32) can thus be solved numerically by the methods discussed in Sect. IV.9.1. 


IV. 10 Exercises 

1. Prove that the symplectic Euler method (1.1.9) conserves quadratic invariants 
of the form (2.5). Explain the “0” entries of Table (1.2.1). 
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2. Prove that under condition (2.3) a Runge-Kutta method preserves all invariants 
of the form I(y) = y T Cy + d T y + c. 

3. Prove that an s-stage diagonally implicit Runge-Kutta method (i.e., = 0 for 

i < j) satisfies the condition (2.3) if and only if it is equivalent to a composition 
@b s h o ... o (p blh based on the implicit midpoint rule. 

4. Prove the following statements: a) If a partitioned Runge-Kutta method con¬ 
serves general quadratic invariants p T Cp + 2 p T Dq + q T Eq , then each of the 
two Runge-Kutta methods has to conserve quadratic invariants separately. 

b) If both methods, } and are irreducible, satisfy (2.3) and if 

(2.7)-(2.8) hold, then we have bi = b{ and aij = dij for all i, j. 

5. Prove that the Gauss methods are the only collocation methods satisfying (2.3). 
Hint. Use the ideas of the proof of Lemma 13.9 in Hairer & Wanner (1996). 

6. Discontinuous collocation methods with either b\ ^ 0 or b s ^ 0 (Defini¬ 
tion II. 1.7) cannot satisfy the criterion (2.3). 

7. (Sanz-Serna & Abia 1991, Saito, Sugiura & Mitsui 1992). The condition (2.3) 
acts as simplifying assumption for the order conditions of Runge-Kutta meth¬ 
ods. Assume that the order conditions are satisfied for the trees u and v. Prove 
that it is satisfied for u o v if and only if it is satisfied for v o u, and that it is 
automatically satisfied for trees of the form uou. 

Remark, u o v denotes the Butcher product introduced in Sect. VI.7.2. 

8. If Lo is a symmetric, tridiagonal matrix that is sufficiently close to A = 
diag(Ai,..., A n ), where Ai > A2 > ... > A n are the eigenvalues of L 0 , then 
the solution of (3.5) with B(L) = L+ — L+ converges exponentially fast to the 
diagonal matrix A. Hence, the numerical solution of (3.5) gives an algorithm 
for the computation of the eigenvalues of the matrix L 0 . 

Hint. Let (3\,... , (3 n be the entries in the diagonal of L , and a % } ..., 
those in the subdiagonal. Assume that |/?fc(0) — A&| < Rj 3 and |o^(0)| < R 
with some sufficiently small R. Prove that /?&(£) — /^ + i(£) > p — R and 
\a k (t)\ < Re~^~ R ^ for all t > 0, where p = min/ c (A/ c — A^+i) > 0. 

9. Elaborate Example 4.5 for the special case where Y is a matrix of dimension 
2. In particular, show that (4.6) is the same as (4.5), and check the formulas for 
the simplified Newton iterations. 

10. (Brenan, Campbell & Petzold (1996), Sect. 2.5.3). Consider the differential 
equation y = f(y) with known invariants g(y) = Const , and assume that g'(y) 
has full rank. Prove by differentiation of the constraints that, for initial values 
satisfying g(yo) = 0, the solution of the differential-algebraic equation (DAE) 

y = f(y) + g\y) T y 
0 = g(y) 

also solves the differential equation y = f(y). 

Remark. Most methods for DAEs (e.g., stiffly accurate Runge-Kutta methods 
or BDF methods) lead to numerical integrators that preserve exactly the con¬ 
straints g(y) = 0. The difference from the projection method of Sect. IV.4 is 
that here the internal stages also satisfy the constraint. 
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11. Prove that SL(n) is a Lie group of dimension n 2 — 1, and that $l(ri) is its Lie 
algebra (see Table 6.1 for the definitions of SL(n) and 

12. Let G be a matrix Lie group and Q its Lie algebra. Prove that for Y G G and 
iG0we have Y AY G 0. 

Hint. Consider the path 7(£) = Fa(t)F _1 . 

13. Consider a problem F = A(Y)Y , for which 21(F) G 50(n) whenever F G 
O(n), but where 21(F) is an arbitrary matrix for F 0 O(n). 

a) Prove that Fo G O (n) implies F (t) G O(n) for all t. 

b) Show by a counter-example that the numerical solution of the implicit mid¬ 
point rule does not necessarily stay in O(n). 

14. (Feng Kang & Shang Zai-jiu 1995). Let R(z) = (1 + z/ 2)/(1 — z/2) be the 
stability function of the implicit midpoint rule. Prove that for A G 5 [(3) we 
have 

det R(hA) = 1 47 det2l = 0. 

15. (Iserles & Nprsett 1999). Introducing yi = y and 7/2 = 27 write the problem 

y + *2/ = 0, 2/(0) = 1, 2/(0) =0 


in the form (7.6). Then apply the numerical method of Example 7.4 with dif¬ 
ferent step sizes on the interval 0 < t < 100. Compare the result with that 
obtained by fourth order classical (explicit or implicit) Runge-Kutta methods. 
Remark. If A(t) in (7.6) (or A(t, y) in (8.1)) are much smoother than the solu¬ 
tion y(t), then Lie group methods are usually superior to standard integrators, 
because Lie group methods approximate A(t), whereas standard methods ap¬ 
proximate the solution y(t) by polynomials. 

16. Deduce the BCH formula from the Magnus expansion (IV.7.5). 

Hint. For constant matrices A and B consider the matrix function A(t) defined 
by A(t) = B for 0 < t < 1 and A(t) = A for 1 < t < 2. 

17. (Rodrigues formula, see Marsden & Ratiu (1999), page 291). Prove that 


exp(i?) = 7 + 


sum 

a 


l) 2 ff 

2 \ a/2 J 


for Q = 



where a = ycjf+cJf + ^3- This formula allows for an efficient implementa¬ 
tion of the Lie group methods in 0(3). 

18. The solution of Y = A(Y)Y,Y((y) = Y 0 , is given by Y(t) = exp(f2(f))F 0 , 
where 17(f) solves the differential equation (8.9). Compute the first terms of the 
f-expansion of 17(f). 

Result. 17(f) = tA(Y 0 ) + ^A'(Y 0 )A(Y 0 )Y 0 + £ (A'(Y 0 ) 2 A(Y 0 )Y 0 2 + 
A\Y 0 )A{Y 0 ) 2 Y 0 +A'\Y 0 ){A{Y 0 )Y 0 ,A{Y 0 )Y 0 )-\[A{Y 0 ),A'{Y 0 )A{Y 0 )Y 0 ]). 

19. Consider the 2-stage Gauss method of order p = 4. In the corresponding Lie 
group method, eliminate the presence of Q in [17, A\ by iteration, and neglect 
higher order commutators. Show that this leads to 
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Vi = exp^//- Ai + - A 2 ^j - h 2 



(~T2 + 



2 / 0 , 


where Ai = A/Y^) and Yi = exp(i?i)t/o- Prove that this is a Lie group method 
of order 4. Is it symmetric? 

20. In Zanna (1999) a Lie group method similar to that of Exercise 19 is presented. 
The only difference is that the coefficients (—1/12 + a/3/24) and (1/12 + 
a/3/24) in the formulas for and i? 2 are replaced with (—5/72 + a/3/24) and 
(5/72 + a/3/24), respectively. Is there an error somewhere? Are both methods 
of order 4? 

21. Show that for given Y the solution of problem (9.7) is Y = UV T , where 
Y = UUV T is the singular value decomposition of Y. 

Hint. Since \\USV t \\f = H^Hf holds for all orthogonal matrices U and V, 
it is sufficient to consider the case Y = (27, 0) T with 27 = diag(ai,..., cr^). 
Prove that ||(27, 0) T — Y\\ 2 F > J2i=i( a i ~ l) 2 f° r a h matrices Y satisfying 
Y T Y = I. 

22. Show that the solution of the matrix differential equation Y = A(t)Y on R nxk , 
with initial values Yq G V n ,fc, can be decomposed as 

Y(t) = U(t)S(t ), where U(t) G V n ,k, S(t) G R kxk 


satisfy the differential equations 

S = U t AUS , U=(I- UU t )AU 


with initial values So = /, Uo = To- 

Remark: These differential equations can be used for the computation of Lya¬ 
punov exponents as an alternative to the differential equations discussed in 
Bridges & Reich (2001) and Dieci, Russell & van Vleck (1997). 

23. Consider the map GL(fc) x V m ,k x V n ,fc —> Mk that associates to (S, £/, V) the 
rank-A: matrix Y = USV T . Show that the extended tangent map 

R kxk x TuVm k x TyVn k TyMk x S0 ( fc ) x SO(fc) 

((55, <517, <5V) ^ (SUSV T + USSV T + USSV T , U T SU, V T SV ) 


is an isomorphism. 

24. Let A(£) G M nxn be symmetric and depend smoothly on t. Show that the 
solution P(t) G G n ,/c of the dynamical low-rank approximation problem on the 
Grassmann manifold, 

P e T P Q n , k with ||P - A|| f = min!, 
is given as P = YY T where Y G V nk solves the differential equation 

Y=(I- YY t )AY. 



Chapter V. 

Symmetric Integration and Reversibility 


Symmetric methods of this chapter and symplectic methods of the next chapter play 
a central role in the geometric integration of differential equations. We discuss re¬ 
versible differential equations and reversible maps, and we explain how symmetric 
integrators are related to them. We study symmetric Runge-Kutta and composition 
methods, and we show how standard approaches for solving differential equations 
on manifolds can be symmetrized. A theoretical explanation of the excellent long¬ 
time behaviour of symmetric methods applied to reversible differential equations 
will be given in Chap. XI. 


V.l Reversible Differential Equations and Maps 


Conservative mechanical systems have the property that inverting the initial direc¬ 
tion of the velocity vector and keeping the initial position does not change the solu¬ 
tion trajectory, it only inverts the direction of motion. Such systems are “reversible”. 
We extend this notion to more general situations. 


Definition 1.1. Let p be an invertible linear transformation in the phase space of 
V = f{y )• This differential equation and the vector field f(y ) are called p-reversible 
if 

pf(y) = -f(py ) for all y. (1.1) 




Fig. 1.1. Reversible vector field (left picture) and reversible map (right picture) 
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This property is illustrated in the left picture of Fig. 1.1. For p-reversible differ¬ 
ential equations the exact flow (ft(y) satisfies 

pop, t = p,_ t op = p,~ 1 op (1.2) 

(see the picture to the right in Fig. 1.1). The right identity is a consequence of the 
group property <p t o p s = <pt +s , and the left identity follows from 

j t (p°vt){y) = pf(yt(yj) =k*~f[{p°vt){.y)) 

j t (ip- t o p)(y) = ~f[(fP-top){y)), 

because all expressions of (1.2) satisfy the same differential equation with the same 
initial value (po (p 0 )(p) = (<po o p)(y) = py. Formula (1.2) motivates the following 
definition. 

Definition 1.2. A map @(y) is called p-reversible if 

p o = (p -1 o p. 

Example 1.3. An important example is the partitioned system 

u = f(u,v), V = g(u,v), (1.3) 

where /(it, —v) = —/(it, v) and g(u, —v) = <7(it, v). Here, the transformation p is 
given by p(it, v) = (it, —v). If we call a vector field or a map reversible (without 
specifying the transformation p), we mean that it is p-reversible with this particu¬ 
lar p. All second order differential equations it = g{u) written as u = v, v = g (it) 
are reversible. As a first implication of reversibility on the dynamics we mention 
the following fact: if it and v are scalar, and if (1.3) is reversible, then any solution 
that crosses the it-axis twice is periodic (Exercise 5, see also the solution of the 
pendulum problem in Fig. 1.1.4). 

It is natural to search for numerical methods that produce a reversible numerical 
flow when they are applied to a reversible differential equation. We then expect the 
numerical solution to have long-time behaviour similar to that of the exact solution; 
see Chap. XI for more precise statements. It turns out that the p-reversibility of a 
numerical one-step method is closely related to the concept of symmetry. 

Thus the method is theoretically symmetrical or reversible , a terminology 
we have never seen applied elsewhere. 

(RC. Hammer & J.W. Hollingsworth 1955) 

Definition 1.4. A numerical one-step method <&h is called symmetric or time- 
reversible} if it satisfies 

<Ph o <P_ h = id or equivalently <Ph = 

1 The study of symmetric methods has its origin in the development of extrapolation meth¬ 
ods (Gragg 1965, Stetter 1973), because the global error admits an asymptotic expansion 
in even powers of h. The notion of time-reversible methods is more common in the Com¬ 
putational Physics literature (Buneman 1967). 
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With the Definition II.3.1 of the adjoint method (i.e., = @Z\), the condition 

for symmetry reads d>h = <P* h . A method y\ = @h(y o) is symmetric if exchanging 
y 0 yi and h <-> —h leaves the method unaltered. In Chap. I we have already en¬ 
countered the implicit midpoint rule (1.1.7) and the Stormer-Verlet scheme (1.1.17), 
both of which are symmetric. Many more symmetric methods will be given in the 
following sections. 

Theorem 1.5. If a numerical method, applied to a p-reversihle differential equa¬ 
tion, satisfies 

po$ h = $_ h op, (1.4) 

then the numerical flow <P>h is a p-reversible map if and only if <Ph is a symmetric 
method. 

Proof As a consequence of (1.4) the numerical flow ^ is p-reversible if and only 
if o p = o p. Since p is an invertible transformation, this is equivalent to 
the symmetry of the method <£>h- □ 

Similarly, it is also true that a symmetric method is p-reversible if and only if 
the p-compatibility condition (1.4) holds. 

Compared to the symmetry of the method, condition (1.4) is much less restric¬ 
tive. It is automatically satisfied by most numerical methods. Let us briefly discuss 
the validity of (1.4) for different classes of methods. 

• Runge-Kutta methods (explicit or implicit) satisfy (1.4) without any restriction 
other than (1.1) on the vector field (Stoffer 1988). Let us illustrate the proof with 
the explicit Euler method <Ph{y o) = yo + hf(yo): 

(p ° &h)(yo) = pyo + hpf(yo ) = pyo ~ hf(py 0 ) = $- h (py 0 ). 

• Partitioned Runge-Kutta methods applied to a partitioned system (1.3) satisfy the 
condition (1.4) if p(u, v ) = (pi(rt), p 2 (v)) with invertible pi and p 2 . The proof is 
the same as for Runge-Kutta methods. Notice that the mapping p(u, v ) = (it, —v) 
of Example 1.3 is of this special form. 

• Composition methods. If two methods d>h and satisfy (1.4), then so does the 
adjoint and the composition d?h o \P h . Consequently, the composition methods 
(3.1) and (3.2) below, which compose a basic method <Ph and its adjoint with 
different step sizes, have the property (1.4) provided the basic method d>h has it. 

• Splitting methods are based on a splitting y = /M (y) + (y) of the differential 

equation. If both vector fields, f^\y) and /^(p), satisfy (1.1), then their exact 
flows (p^ and satisfy (1.2). In this situation, the splitting method (II.5.6) has 
the property (1.4). 

• For differential equations on manifolds we have to assume that p maps M. to 
Ai. Otherwise, condition (1.1) does not make sense. For the projection method 
of Algorithm IV.4.2 with orthogonal projection onto the manifold we have: if 
the basic method satisfies (1.4) and if p is an orthogonal matrix, then it satisfies 
(1.4) as well. This follows from the fact that the tangent and normal spaces satisfy 
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T py M = pT y M and N py M = p~ T N y M , respectively. A similar result holds 
for methods based on local coordinates, if the local parametrization is well cho¬ 
sen. For example, this is the case if pf>{z) is the parametrization at pyo whenever 
'ip(z) is the parametrization at y$. 


V.2 Symmetric Runge-Kutta Methods 

We give a characterization of symmetric methods of Runge-Kutta type and mention 
some important examples. 


V.2.1 Collocation and Runge-Kutta Methods 

Symmetric collocation methods are characterized by the symmetry of the colloca¬ 
tion points with respect to the midpoint of the integration step. 

Theorem 2.1. The adjoint method of a collocation method (Definition II. 1.3) based 
on ci,..., c s is a collocation method based on c *,..., c*, where 

c* = 1 -c a+1 _i. (2.1) 

In the case that c h = 1 — c s+ i-ifor all i, the collocation method is symmetric. 

The adjoint method of a discontinuous collocation method (Definition II. 1.7) 
based on &i, b s and C 2 ,..., c s _i is a discontinuous collocation method based on 
b*, b* and ..., c*_ v where 

b{ = b s , b* = bi and c* = 1 — c s+ i_^. (2.2) 

In the case that b\ = b s and q = 1 — c s +i-ifor all i, the discontinuous collocation 
method is symmetric. 



0 Cl C 2 C 3 C4 c 5 1 


Fig. 2.1. Symmetry of collocation methods 


Proof. Exchanging (to?2/o) ^ it i,2/i) and h —h in the definition of a collo¬ 
cation method we get 'u(ti) = yi, u(ti — Cih) = f(t\ — Cih,u(ti — q/i)), and 
2/o = — /i). Inserting t\ = to + h this yields the collocation method based 

on c* of (2.1). Observe that the c* can be arbitrarily permuted. For discontinuous 
collocation methods the proof is similar. □ 


The preceding theorem immediately yields the following result. 
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Corollary 2.2. The Gauss formulas (Table II. 1.1), as well as the Lobatto 111A (Ta¬ 
ble 11.1.2) and Lobatto 111B formulas (Table 11.1.4) are symmetric integrators. □ 

Theorem 2.3 (Stetter 1973, Wanner 1973). The adjoint method of an s-stage 
Runge-Kutta method (II. 1.4) is again an s-stage Runge-Kutta method. Its coeffi¬ 
cients are given by 




a ij — bs+l-j 


C's+l —i,s+l — j i 


b* = b s+1 -i. 


(2.3) 


a a+ i_ i)S+ i_j + = bj for alii, j, (2.4) 

then the Runge-Kutta method (11.1.4) is symmetric. 2 

Proof. Exchanging yo yi and ft. <-> — ft. in the Runge-Kutta formulas yields 


ki'k f [Vo + hy2(bj - a ij)kj ), yi = yo + hy^bih- (2.5) 
j =1 i =1 

Since the values Ylj=i (bj — a^) = 1 — q appear in reverse order, we replace ki by 
k s +\-i in (2.5), and then we substitute all indices i and j by 5 + 1 — i and s + 1 — j, 
respectively. This proves (2.3). 

The assumption (2.4) implies a*- = and 6 * = bi, so that <T>* h •# □ 

Explicit Runge-Kutta methods cannot fulfill condition (2.4) with i = j, and it is 
not difficult to see that no explicit Runge-Kutta can be symmetric (Exercise 2). Let 
us therefore turn our attention to diagonally implicit Runge-Kutta methods (DIRK), 
for which = 0 for i < j, but with diagonal elements that can be non-zero. In 
this case condition (2.4) becomes 


Q>ij — bj — b s -\-\—j for i j 9 djj j — bj. ( 2 . 6 ) 

The Runge-Kutta tableau of such a method is thus of the form (e.g., for 5 = 5) 


Cl 

CL 11 

C 2 

bl U 2 2 

C3 

bl b2 (233 

1 - c 2 

bl b 2 b 3 (24 4 

1 - Cl 

bi b 2 b 3 b 2 a 55 


bi b 2 b 3 b 2 bi 


with (233 = 63 / 2 , (244 = ^2 — cl 22 , and ass = b\ — a\\. If one of the bi vanishes, 
then the corresponding stage does not influence the numerical result. This stage can 
therefore be suppressed, so that the method is equivalent to one with fewer stages. 
Our next result shows that methods (2.7) can be interpreted as the composition of 
^-methods, which are defined as 

2 For irreducible Runge-Kutta methods, the condition (2.4) is also necessary for symmetry 
(after a suitable permutation of the stages). 
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®l(yo) = yi, wh ere yi = y 0 + hf((l - 9)y 0 + Oyi). (2.8) 


1-6 


Observe that the adjoint of the 0-method is = $ h . 

Theorem 2.4. A diagonally implicit Runge-Kutta method satisfying the symmetry 
condition (2.4) and bi 0 is equivalent to a composition of 0-methods 


Kh°^S°---°K 2 h °K h ’ 


(2.9) 


where ai = au/bi. 

Proof Since the 0-method is a Runge-Kutta method with tableau 


e 

e 


1 


this follows from the discussion in Sect. ID. 1.3. We have used 

which holds, because 6 s +i_i = bi and a s +\-i = 1 — ol{ by (2.6). □ 

A more detailed discussion of such methods is therefore postponed to Sect. V.3 
on symmetric composition methods. 

V.2.2 Partitioned Runge-Kutta Methods 

Applying partitioned Runge-Kutta methods (II.2.2) to general partitioned systems 

y = f(y,z), z = y(y,z), (2.10) 

it is obvious that for their symmetry both Runge-Kutta methods have to be symmet¬ 
ric (because y = f(y) and i = g(z) are special cases of (2.10)). The proof of the 
following result is identical to that of Theorem 2.3 and therefore omitted. 

Theorem 2.5. If the coefficients of both Runge-Kutta methods bi , a^- and bi , a^- 
satisfy the condition (2.4), then the partitioned Runge-Kutta method (II.2.2) is sym¬ 
metric. □ 

As a consequence of this theorem we obtain that the Lobatto IIIA-IIIB pair (see 
Sect. II.2.2) and, in particular, the Stormer-Verlet scheme are symmetric integrators. 

An interesting feature of partitioned Runge-Kutta methods is the possibility of 
having explicit, symmetric methods for problems of the form 

V = f{z), z=g(y). (2.11) 

Second order differential equations y = g(y), written in the form y = z, z = g(y) 
have this structure, and also all Hamiltonian systems with separable Hamiltonian 
H (p, q ) = T(p) + V (q). It is not possible to get explicit symmetric integrators with 
non-partitioned Runge-Kutta methods (Exercise 2). 

The Stormer-Verlet method (Table II.2.1) applied to (2.11) reads 
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Zl/2 = Z 0 + ^ 5(2/0) 

2/1 = yo + hf(z 1/2 ) 

Zi = z 1/2 + ^g(yi) 

and is the composition o@ h / 2 , where 

( 2 . 12 ) 


(2.13) 

its adjoint. All these methods are obviously explicit. How can they be extended to 
higher order? The idea is to consider partitioned Runge-Kutta methods based on 
diagonally implicit methods such as in (2.7). If an • an = 0, then one component 
of the it h stage is given explicitly and, due to the special structure of (2.11), the 
other component is also obtained in a straightforward manner. In order to achieve 
an ’ an = 0 with a symmetric partitioned method, we have to assume that 5 , the 
number of stages, is even. 

Theorem 2.6. A partitioned Runge-Kutta method, based on two diagonally implicit 
methods satisfying an •da = 0 and (2.4) with bi ^ 0 and bi 0, is equivalent 
to a composition of and 4?l. h with <&h and <P* h given by (2.12) and (2.13), 

respectively. □ 

For example, the partitioned method 



0 

h 


h b 2 

h 0 


bi b 2 0 

h b 2 b 2 


h b 2 b 2 bi 

bi b 2 b 2 0 


bi b 2 b 2 bi 

/ 01 


satisfies the assumptions of the preceding theorem. Since the methods have identical 
stages, the numerical result only depends on b\, b\ + b 2 , b 2 + ^ 3 , b% + 64 , and 
64 . Therefore, we can assume that bi = bi and the method is equivalent to the 
composition n 1 h°^b 2 h°n 2h °^ 1 h- 


= $ h 

is the symplectic Euler method and 


=n 


2/i = 2/o + hf(zi) 
Zi = Z 0 + hg(yo) 

2/i = 2/o + hf(z 0 ) 
Zi = Zo + hg(yi) 


V.3 Symmetric Composition Methods 

In Sect. II.4 the idea of composition methods is introduced, and a systematic way 
of obtaining high-order methods is outlined. These methods, based on (II.4.4) or on 
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(II.4.5), turn out to be symmetric, but they require too many stages. A theory of order 
conditions for general composition methods is developed in Sect. III.3. Here, we 
apply this theory to the construction of high-order symmetric methods. We mainly 
follow two lines. 

• Symmetric composition of first order methods. 

*h = $a s h O $*p ah O...O $l 2h O $ aih O $* ih , (3.1) 

where d>h is an arbitrary first order method. In order to make this method sym¬ 
metric, we assume a s = /?i, a s _i = fc, etc. 

• Symmetric composition of symmetric methods. 

= $ ls h ° @ 7 s-ih o ...o $ l2h o <3? 11 h, (3.2) 

where <d>h is a symmetric second order method and y s = 71 , 7 s -i = 72, etc. 


V.3.1 Symmetric Composition of First Order Methods 

Because of Lemma 3.2 below, every method (3.2) is a special case of method (3.1). 
In this subsection we concentrate on methods that are of the form (3.1) but not of 
the form (3.2). 

For constructing methods (3.1) of a certain order, one has to solve the system 
of nonlinear equations given in Theorem III.3.14 (see also Example III.3.15). The 
symmetry assumption on the coefficients considerably simplifies this system. 

Theorem 3.1. If the coefficients of method (3.1) satisfy a s +i = Pi for all i, then 
it is sufficient to consider those trees with odd \\r ||. 

Proof. This is a consequence of Theorem II.3.2 (the maximal order of symmetric 
methods is even). In fact, if the condition for order 1 is satisfied, it is automatically 
of order 2. If, in addition, the conditions for order 3 are satisfied, it is automatically 
of order 4, etc. □ 

It may come as a surprise that the popular leapfrog ... can be beaten, but 
only slightly. (R.I. McLachlan 1995) 

Methods of Order 2. The only remaining condition for order two is J2k=i ( a k + 
Pk) = 1, and, for s = 1, the symmetry requirement leads to d> h / 2 0( &h /2 - Depending 
on the choice of $h, this method is equivalent to the midpoint rule, the trapezoidal 
rule, or the Stormer-Verlet scheme, all very famous and frequently used. However, 
McLachlan (1995) discovered that the case s = 2 can be slightly more advanta¬ 
geous. We obtain 

*«/> o $\ 1/2 -«) h 0 *(1/2 -a)h 0 Kh, (3.3) 

where a is a free parameter, which can serve for clever tuning. 

Minimizing the Local Error of Composition Methods. Subtracting the B 0 c - 
series of the numerical and the exact solutions (see Sect. III.3.2), we obtain 
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Fig. 3.1. The error functions \qi(a)\ defined in (3.5) (left picture). Work-precision diagrams 
for the Kepler problem (as in Fig.II.4.4) and for method (3.3) with a = 0.25 (Stormer- 
Verlet), a — 0.1932 (McLachlan), and a = 0.22. “IE”: method <Ph treats position by implicit 
Euler, velocity by explicit Euler; “El”: method <Ph treats position by explicit Euler, velocity 
by implicit Euler 


&h(y) - <Ph{y) = h p+1 ^2 Ay ( a ( r ) _ e ( r )) F ( T )(y) + 0(h p+2 ). 

||r||=p+l a T 

Assuming that the basic method has an expansion $h(y) = y + hf(y ) + h 2 d 2 (y) + 
h 3 d 3 {y) + ... , we obtain for method (3.3), similar to (III.3.3), the local error 

h 3 (qi{a)d 3 (y) + q 2 (a)(d' 2 f)(y) + q 3 (a)(fd 2 )(y) 

+\<u{a) (/"(/,/)) (y) + 95 (a) (/77)(y)) + 0(h 4 ), 

which contains one term for each of the 5 trees r G T ^ with | |r| | = 3. The qi(a) 
are the polynomials 

qi(a) = ^(l — 6a + 12 a 2 ), ^ 2 ( 0 ^) = j (—1 + 6 a — 8 a 2 ), 

2 2 (3.5) 

£3 (a) = -a 2 , 44(a) = -(1 — 6 a + 6 a 2 ), 45(a) = -gi(a), 

which are plotted in the left picture of Fig. 3.1. If we allow arbitrary basic methods 
and arbitrary problems, all elementary differentials in the local error are indepen¬ 
dent, and there is no overall optimal value for a. We see that the modulus of qi(a) 
and ( 72 (a) are minimal for a = 1/4, which is precisely the value corresponding to a 
double application of <P h / 2 o 4 >*^ 2 with halved step size. But the values 1 43 (a)! and 
|(/ 4 (a) | become smaller with decreasing a (close to a = 1/4). McLachlan (1995) 
therefore minimizes some norm of the error (see Exercise 4) and arrives at the value 
a = 0.1932. 

In the numerical experiment of Fig. 3.1 we apply method (3.3) with three differ¬ 
ent values of a to the Kepler problem (with data as in Fig. II.4.4 and the symplectic 
Euler method for ^). Once we treat the position variable by the implicit Euler 
method and the velocity variable by the explicit Euler method (central picture), and 
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once the other way round (right picture). We notice that the method which is best in 
one case is worst in the other. 

This simple experiment shows that choosing the free parameters of the method 
by minimizing some arbitrary measure of the error coefficients is problematic. For 
higher order methods there are many more expressions in the dominating term of 
the local error (for example: 29 terms for | |r 11 =5). The corresponding functions qi 
give a lot of information on the local error, and they indicate the region of parame¬ 
ters that produce good methods. But, unless more information is known about the 
problem (second order differential equation, nearly integrable systems), one usually 
minimizes, for orders of 8 or 10, just the maximal values of the a z , [3 t , or 7 ^ (Kahan 
& Li 1997). 

Methods of Order 4. Theorem 3.1 and Example III.3.15 give 3 conditions for 
order 4. Therefore, we put s = 3 in (3.1) and assume symmetry (3\ = as, (3 2 = 
and (3s = aq. This leads to the conditions 

a\ + a 2 + as = —, af + a \| = 0 , ( 0^3 — o^){a\ a 2 ) = 0 . 

Since with oq + a 2 — 0 or with aq + as = 0 the first two of these equations are not 
compatible, the unique solution of this system is 


1 2 1 / 3 
a x -a 3 - 2 ( 2 _ 2 i /3 )> “ 2 _ “ 2(2-2V 3 )' 


We observe that fy = a\ for all i. Therefore, <& ai h 0 ^ 3 . ^ can be grouped together in 
(3.1) and we have obtained a method of type (3.2), which is actually method (II.4.4) 
with p = 2 . 

Again, the solutions with the minimal number of stages do not give the best 
methods (remember the good performance of Suzuki’s fourth order method (II.4.5) 
in Fig. II.4.4), so we look for 4th order methods with larger s. McLachlan (1995) 
has constructed a method for s = 5 with particularly small error terms and nice 
coefficients 


(3\ — a$ ~ 
(3 2 = a 4 = 


14- a/19 
108 ’ 

-23 - 20^19 
270 ’ 


ai = (3$m 

a 2 = (3 4 = 


146 + 5a/19 
540 

-2 + 10^19 
135 


(3.6) 


p3=a 3 = g, 


which he recommends “for all uses”. 

In Fig. 3.2 we compare the numerical performances of all these methods on our 
already well-known example in both variants (implicit-explicit and vice-versa). We 
see that the best methods in one picture may be worse in the other. For comparison, 
the results are surrounded by “ghosts in grey” representing good formulae from the 
next lower (order 2 ) and the next higher (order 6 ) class of methods. 

Methods Tuned for Special Problems. In the case where one is applying a special 
method to a special problem (e.g., to second order differential equations or to small 
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Fig. 3.2. Work-precision diagrams for methods of order 4 as in Fig. 3.1; “3j”: the Triple Jump 
(II.4.4); “su”: method (II.4.5) of Suzuki; “ml”: McLachlan (3.6); “bm”: method (3.7); in 
grey: neighbouring order methods Stormer/Verlet (order 2) and p6 s9 (order 6) 


perturbations of integrable systems), more spectacular gains of efficiency are pos¬ 
sible. For example, Blanes & Moan (2002) have constructed the following fourth 
order method with 8 = 6 

p x =a 6 = 0.082984406417405, = p 6 = 0.16231455076687, 

(3 2 = a 5 = 0.23399525073150, a 2 = A, = 0.37087741497958, (3.7) 

/? 3 = «4 = -0.40993371990193, a 3 = fa = 0.059762097006575, 

which, when correctly applied to second order differential equations (right picture 
of Fig. 3.2) exhibits excellent performance. 

Further methods, adapted to the integration of second order differential equa¬ 
tions, have been constructed by Forest (1992), McLachlan & Atela (1992), Calvo 
& Sanz-Sema (1993), Okunbor & Skeel (1994), and McLachlan (1995). Another 
important situation, which allows a tuning of the parameters, are near-integrable 
systems such as the perturbed two-body motion (e.g., the outer solar system consid¬ 
ered in Chap. I). If the differential equation can be split into y = /M (y) + /^ (y), 
where y = f^\y) is exactly integrable and f^(y) is small compared to f^(y), 
special integrators should be used. We refer to Kinoshita, Yoshida & Nakai (1991), 
Wisdom & Holman (1991), Saha & Tremaine (1992), and McLachlan (1995b) for 
more details and for the parameters of such integrators. 

Methods of Order 6. By Theorem 3.1 and Example III.3.12 a method (3.1) has to 
satisfy 9 conditions for order 6. It turns out that these order conditions have already 
a solution with 8 = 7, but all known solutions with s < 8 are equivalent to methods 
of type (3.2). With order 6 we are apparently close to the point where the enormous 
simplifications of the order conditions due to Theorem 3.3 below start to outperform 
the freedom of choosing different values for ai and f3 t . We therefore continue our 
discussion by considering only the special case (3.2). 
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V.3.2 Symmetric Composition of Symmetric Methods 

The introduction of more symmetries into the method simplifies considerably the or¬ 
der conditions. These simplifications can be best understood with a sort of “Choleski 
decomposition” of symmetric methods (Murua & Sanz-Serna 1999). 

Lemma 3.2. For every symmetric method <Th{y) that admits an expansion in pow¬ 
ers ofh, there exists 4>h{y) such that 


4> h (y) = ($ h / 2 °$* h/2 )(y)- 

Proof. Since @h(y) = y + 0(h) is close to the identity, the existence of a unique 
method F h (y) = y + hd\(y) + h?d 2 (y) + ... satisfying <F h = @h/ 2 °@h /2 follows 
from Taylor expansion and from a comparison of like powers of h. 

If @h(y) is symmetric, we have in addition 

^^1 = r[ /2 ori /2 , 


and $h /2 — @-h /2 = ^h /2 f°ll° ws f rom the uniqueness of □ 

We let be a symmetric method, and we consider the composition 


— ^7 s h ° • • • ° ^72 h ° ^71 h - ( 3 - 8 ) 

Using the method dfh of Lemma 3.2, this composition method is equivalent to (3.1) 
(@h replaced with ^) with 

= (3.9) 

Theorem 3.3. For composition methods (3.8) with symmetric <Ph it is sufficient to 
consider the order conditions of Theorem III.3.14 for r G PL where all vertices ofr 
have odd indices. 


Proof. If i(r) is even, it follows from = [3k and from (III.3.11) that 


a s (r) = a s -i(r ) = ... = ai(r) = a 0 (r) = 0. 


Since e(r) = 0 for such trees, the corresponding order condition is automatically 
satisfied. Any other vertex with an even index can be brought to the root by applying 
the Switching Lemma III.3.8. □ 

After this reduction, only 7 conditions survive for order 6 from the trees dis¬ 
played in Example III.3.12. A further reduction in the number of order conditions is 
achieved by assuming symmetric coefficients in method (3.8), i.e., 

7s+i-| = lj for all j. (3.10) 

This implies that the overall method Fh is symmetric, so that the order conditions 
for trees with an even | |r 11 need not be considered. This proves the following result. 
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Theorem 3.4. For composition methods (3.8) with symmetric satisfying (3.10), 
it is sufficient to consider the order conditions for r G hi where all vertices of r 
have odd indices and where ||r || is odd. □ 

Figure 3.3 shows the remaining order conditions for methods up to order 10. We 
see that for order 6 there remain only 4 conditions, much less than the 166 that we 
started with (Theorem III.3.6). 

Example 3.5. The rule of (III.3.14) leads to the following conditions for symmetric 
composition of symmetric methods: 


Order 2: 


S> = 1 

k=1 



Order 4: 


IM- 

II 

O 



Order 6: 

© 

£^=0 

k=l 

f 

s k ^ 

2>*(2» =» 

k=1 i=1 

Order 8: 

© 

EM* 

II 

O 

t 

s k 9 

=» 

k= 1 t= 1 


¥ 

s k k 

= 0 

k=l 1=1 m= 1 


k=1 t= 1 


Here, similar to Example III.3.15, a prime attached to a summation symbol indi¬ 
cates that the last term is taken as 7 ^/ 2 . 


Methods of Order 4. The methods (II.4.4) and (II.4.5) are both of the form (3.8), 
and those with p = 2 yield methods of order 4. We have seen in the experiment 
of Fig. II.4.4 that the method (II.4.5) yields more precise approximations; see also 
Fig. 3.2. We do not know of any 4th order method of type (3.2) that is significantly 
better than method (3.1) with coefficients (3.6). 
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Methods of Order 6. If we search for a minimal stage solution of the four 
equations for order 6 , we apparently need four free parameters 71 , 72 , 73, 7 + then 
75 ? 76 5 77 are determined by symmetry. Equation ® gives 74 = 1 — 2 ( 71 + 72 + 73 ). 
So we end up with three equations for the three unknowns 71 , 72,73. A numerical 
search for this problem produces three solutions, the best of which has been discov¬ 
ered by many authors, in particular by Yoshida (1990), and is as follows: 


71 = 77 = 0.78451361047755726381949763 

72 = 7 e = 0.23557321335935813368479318 

73 = 75 = -1.17767998417887100694641568 

74 = 1.31518632068391121888424973 

Using computer algebra, Koseleff (1996) proves that the nonlinear system for 
71 , 72,73 has not more than three real solutions. 

Similar to the situation for order 4, where relaxing the minimal number of stages 
allowed a significant increase of performance, we also might expect to obtain better 
methods of order 6 in this way. McLachlan (1995) increases s by two and constructs 
good methods with small error coefficients. By minimizing max^ I 7 ® Kahan & Li 
(1997) obtain the following excellent method 3 



71 = 79 

72 = 78 

73 = 77 

74 = 76 

75 


0.39216144400731413927925056 

0.33259913678935943859974864 

-0.70624617255763935980996482 

0.08221359629355080023149045 

0.79854399093482996339895035 


p6 s9 



(3.12) 


This method produces, with a comparable number of total steps, errors which are 
typically smaller than those of method (3.11). Numerical results of these two meth¬ 
ods are given in Fig. 3.4. 



Fig. 3.4. Work-precision diagrams for methods of order 6 for the Kepler problem as in 
Fig. 3.1; “7”: method p6s7 of (3.11); “9”: method p6 s9 of (3.12); in grey: neighbouring 
order methods (3.6) (order 4) and p8 si7 (order 8) 

3 The authors are grateful to S. Blanes for this reference. 
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Methods of Order 8 . For order 8 , Fig. 3.3 represents 8 equations to solve. This in¬ 
dicates that the minimal value of 8 is 15. A numerical search for solutions 71 ,..., 7 8 
of these equations produces hundreds of solutions. We choose among all these the 
solution with the smallest max(| 7 ^|). The coefficients, which were originally given 
by Suzuki & Umeno (1993), Suzuki (1994), and later by McLachlan (1995), are as 
follows: 

71 = 715 = 0.74167036435061295344822780 

7 2 = 7l4 = -0.40910082580003159399730010 vS slb 

73 = 7l3 = 0.19075471029623837995387626 

74 = 7l2 = -0.57386247111608226665638773 

75 = 7ll = 0.29906418130365592384446354 

76 = 7l0 = 0.33462491824529818378495798 

77 = 7 9 = 0.31529309239676659663205666 

78 = -0.79688793935291635401978884 

By putting s = 17 we obtain one degree of freedom in solving the equations. This 
allows an improvement on the foregoing method. The best known solution, slightly 
better than a method of McLachlan (1995), has been found by Kahan & Li (1997) 
and is given by 

71 = 7l7 = 0.13020248308889008087881763 

7 2 = 7l6 = 0.56116298177510838456196441 

73 = 7l5 = -0.38947496264484728640807860 

74 = 7l4 = 0.15884190655515560089621075 

75 = 713 = -0.39590389413323757733623154 

76 = 712 = 0.18453964097831570709183254 

77 = 711 = 0.25837438768632204729397911 

7s = 7io = 0.29501172360931029887096624 

79 = -0.60550853383003451169892108 

Numerical results, in the same style as above, are given in Fig. 3.5. 






Fig. 3.5. Work-precision diagrams for methods of order 8 for the Kepler problem as in 
Fig. 3.1; “15”: method p8 sl5 of (3.13); “17”: method pS sl7 of (3.14); in grey: neighbouring 
order methods p6 s9 (order 6) and plO s35 (order 10) 
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Methods of Order 10. The first methods of order 10 were given by Kahan & Li 
(1997) with 5 = 31 and s = 33, which could be improved on after some nights of 
computer search (see method (V.3.15) of the first edition). A significantly improved 
method for s = 35 (see Fig. 3.5 for a comparison with eighth order methods) has in 
the meantime been found by Sofroniou & Spaletta (2004): 


71 = 735 

72 = 734 

73 1 733 

74 = 732 

75 = 731 

76 = 730 

77 = 729 

78 = 728 

79 = 727 

710 1 726 

711 = 725 

712 = 724 

713 1 723 

714 1 722 

715 = 721 

716 = 720 
717 = 719 

718 


0.07879572252168641926390768 
0.31309610341510852776481247 
0.02791838323507806610952027 
-0.22959284159390709415121340 
0.13096206107716486317465686 
-0.26973340565451071434460973 plO s35 
0.07497334315589143566613711 
0.11199342399981020488957508 
0.36613344954622675119314812 
-0.39910563013603589787862981 
0.10308739852747107731580277 
0.41143087395589023782070412 
-0.00486636058313526176219566 
-0.39203335370863990644808194 
0.05194250296244964703718290 
0.05066509075992449633587434 
0.04967437063972987905456880 
0.04931773575959453791768001 



V.3.3 Effective Order and Processing Methods 

There has recently been a revival of interest in the concept of “effective 
order”. (J.C. Butcher 1998) 

The concept of effective order was introduced by Butcher (1969) with the aim of 
constructing 5th order explicit Runge-Kutta methods with 5 stages. The idea is to 
search for a computationally efficient method such that with a suitable Xh> 

Vh = Xh°K h o X ~ 1 (3.16) 

has an order higher than that of K The method K\ is called the kernel , and Xh can 
be interpreted as a transformation in the phase space, close to the identity. Because 
of 

*h = Xh°K” o X ~\ 

an implementation of ^ over N steps with constant step size h has the same com¬ 
putational efficiency as K The computation of x^ 1 has only to be done once at the 
beginning of the integration, and Xh has to be evaluated only at output points, which 
can be performed on another processor. In the article Lopez-Marcos, Sanz-Serna & 
Skeel (1996) the notion of preprocessing for the step x^ 1 and postprocessing for 
Xh is introduced. 
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Example 3.6 (Stormer-Verlet as Processed Symplectic Euler Method). Con¬ 
sider a split differential equation, let ^ = ipff o be the Lie-Trotter formula 

or symplectic Euler method (see Sect. II.5), and = (p^j 2 ° ^h ° ^h /2 ^ tran g 
splitting or Stormer-Verlet scheme. As a consequence of the group property of the 
exact flow, we have 


[*] - J 1 ! 


= 


Vh/2 


o 


[LT] „ ,„[ 1 ] 


'P h/2 = Xh°$, 


[LT] 


X h 


with Xh = ^h/ 2 ’ Hence, applying the Lie-Trotter formula with processing yields a 
second order approximation. 


Since the use of geometric integrators requires constant step sizes, it is quite 
natural that Butcher’s idea of effective order has been revived in this context. A sys¬ 
tematic search for processed composition methods started with the works of Wis¬ 
dom, Holman & Touma (1996), McLachlan (1996), and Blanes, Casas & Ros (1999, 
2000b). 

Let us explain the technique of processing in the situation where the kernel Kh 
is a symmetric composition 


K h = 0y sh o...o0y 2h o0^ lh (7s+i-i = 7i for all i) (3.17) 

of a symmetric method <&h- We suppose that the processor is of the form 

Xh = @5 r h o ... o $ S2h o $ Slh , (3.18) 

such that its inverse is given by (use the symmetry = @-h) 

Xh 1 = ®-8ih ° &-5 2 h ° • • • ° &-6 r h- (3.19) 

Order Conditions. The composite method &h = Xh ° Kh °X^ is of the form 

&h = @e 2r+s h o ... o & £2h o $ eih with 

( & 2 r+s •> ••• 5 ^ 2 -> &l) = (^r 5 • • * 5 5 Ifs 5 • • • ? Tl 5 ^15 • • • 5 ^r) • ( 3 . 20 ) 


Theorem 3.3 thus tells us that only the order conditions corresponding to r G 7Y, 
whose vertices have odd indices, have to be considered. Unfortunately, the sequence 
{si} of (3.20) does not satisfy the symmetry relation (3.10), unless all Si vanish. 
However, if we require 

x-h(y) = Xh(y) + 0(h p+1 ), (3.21) 

we see that xpiv) = Xh(v) + 0(h p+1 ), and the method & h = Xh ° K h o x p 
is symmetric up to terms of order 0(h p+1 ). Consequently, the reduction of Theo¬ 
rem 3.4 is valid, so that for order p only the trees of Fig. 3.3 have to be considered. 
For the first tree of Example 3.5 the order condition is 

2 r+s s 

1 ^ ^ £ k ^ ^ Ifk : 

k =1 k =1 
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and we see that this is a condition on the kernel only. Similarly, for odd i we 
have 

2 r+s s 

°=E4=Ei (3.22) 

k =1 k =1 

so that also the trees ®, ®, ©, ... give conditions on 7© and cannot be influenced 
by the processor. We next consider the trees of Example 3.5 with three vertices, 
whose order condition is 


2 r+s k f k f 

o=E4E4E4 

k=1 £= 1 m=l 

We split the sums according to the partitioning into in (3.20), and we 

denote the expressions appearing in Example 3.5 by a(r) and those corresponding 
to Xh and Xh 1 by b(r) and 6 _ 1 (r), respectively. Using the abbreviations for the 
tree with one vertex labelled i, Tij for the tree with two vertices labelled i (the root) 
and j, and by Tij q the trees with three vertices labelled i (root), j and q (vertices that 
are directly connected to the root), this yields 

0 = b- 1 (T ijq ) + a(T i )b- 1 (T j )b- 1 (T q ) + a(T ij )b- 1 (T q ) 

+ a(T iq )b~ 1 (T j )+a(r ijq ) + b(T i )b~ 1 (T j )b~ 1 (Tq) (3.23) 

+ + HnWr^b- 1 (r q ) + 6(r i )a(r J >(r g ) 

+ K T ij)bf^(T q ) + b(Tij)a(T q ) + (>(rj g ) 6 _ 1 (rj) + b(T iq )a(Tj) + b(T ijq ). 

How can we simplify this long expression? First of all, we imagine i© to be the 
identity (either s = 0 or all 7 i = 0), so that = Xh 0 X ^ 1 becomes the identity. In 
this situation, the terms involving a(r) are not present in (3.23), and we obtain 

0 = b~ l (Tijq) + b(Ti)b~ 1 (Tj )b~ 1 (Tq) + b(Tij )b ~ 1 (Tq) + 6(r ig )() _1 ( 7 -j) + b(Tijq). 

We can thus remove all terms in (3.23) that do not contain a factor a(r). Now ob¬ 
serve that by (3.21), Xh(y) as we b as b ave an expansion in even powers of 

h. Therefore, b(r) and 6 _1 (r) vanish for all r with odd ||r||. Formula (3.23) thus 


simplifies considerably and yields 

0 = a(r 3 n) + 26(r 3 i)a(ri), (3.24) 

0 = a(r 5 n) + 26(r 5 i)a(ri), (3.25) 

0 = a(r 3 i 3 ) + 6(r3i)a(r 3 ) + 6(r 3 3)a(ri). (3.26) 

A similar computation for the last tree in Example 3.5 gives (in an obvious notation) 

0 = a(r 3 mi) + 46(r 3 i)a(ri) 3 + 4b(r 3m )a(ri). (3.27) 


Since a(ri) = 7i = 1> the conditions (3.24), (3.25) and (3.27) can be inter¬ 
preted as conditions on the processor, namely on b(r 3 i), 6 ( 751 ) and 6 (r 3 m). We 
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already have < 2 ( 73 ) = 0 from (3.22), and an application of the Switching Lemma 
III.3.8 gives 6 ( 733 ) = ^( 6 (t 3) 2 — 6 (r 6 )). The term 6 ( 73 ) vanishes by (3.21) and 
b(r 6 ) = 0 is a consequence of the proof of Theorem 3.3. Therefore (3.26) is equiv¬ 
alent to < 2 ( 7313 ) =0. We summarize our computation in the following theorem. 

Theorem 3.7. The processing method ^ = Xh 0 Kh 0 X ^ 1 A of order p (p < 8), if 

• the coefficients 7 i of the kernel satisfy the conditions of the left column in Exam¬ 
ple 3.5, i.e., 3 conditions for order 6 , and 5 conditions for order 8 ; 

• the coefficients 5i of the processor are such that (3.21) holds (4 conditions for 

order 6 , and 8 conditions for order 8), and in addition condition (3.24) for order 
6 , and (3.24), (3.25), (3.27) for order 8 are satisfied. □ 

Remark 3.8. Although we have presented the computations only for p < 8 , the 
result is general. All trees r G TL, which are not of the form r = u o ®, give 
rise to conditions on the kernel Kh (for a similar result in the context of Runge- 
Kutta methods see Butcher & Sanz-Serna (1996)). The remaining conditions have 
to be satisfied by the coefficients of the processor. Due to the reduced number of 
order conditions, it is relatively easy to construct high order kernels. However, the 
difficulty in constructing a suitable processor increases rapidly with the order. 

The application of the processing technique is two-fold. A first possibility is 
to take one of the high-order composition methods of the form (3.2), e.g., one of 
those presented in Sect. V.3.2, and to exploit the freedom in the coefficients of the 
processor to make the error constants smaller. 

Another possibility is to start from the beginning and to construct a method Kh 
with coefficients satisfying only the conditions of Theorem 3.7. Methods of effective 
order 6 and 8 have been constructed in this way by Blanes ( 2001 ). 


V.4 Symmetric Methods on Manifolds 

Numerical methods for differential equations on manifolds have been introduced 
in Sections IV.4 and IV.5. The presented algorithms are in general not symmetric. 
We discuss here suitable symmetric modifications which often have an improved 
long-time behaviour. We consider a differential equation 

y = f(y), f(y) e T y M (4.1) 

on a manifold M, and we assume that the manifold is either given as the zero set of 
a function g(y) or by means of a suitable parametrization y = p(z). 


V.4.1 Symmetric Projection 

Due to the projection at the end of an integration step, the standard projection 
method (Algorithm IV.4.2) is not symmetric (see Fig. IV.4.2). In order to make the 
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overall algorithm symmetric, one has to apply a kind of “inverse projection” at the 
beginning of each integration step. This idea has first been used by Ascher & Reich 
(1999) to enforce conservation of energy, and it has been applied in more general 
contexts by Hairer (2000). 

Algorithm 4.1 (Symmetric Projection Method). Assume that y n G M. One step 
Un ► Un+ 1 A defined as follows (see Fig. 4.1, right picture): 

• y n = y n + G(y n ) T y where g(y n ) = 0 (perturbation step); 

• y n +i = ^h(yn) (symmetric one-step method applied to y = f(y)); 

• y n +1 = y n +i + G(t/ n+ \) T y with y such that g(y n + 1 ) = 0 (projection step). 

Here, G(y ) = #'(?/) denotes the Jacobian of g(y). It is important to take a 
symmetric method in the second step, and the same vector y in the perturbation and 
projection steps. 



Existence of the Numerical Solution. The vector y and the numerical approxima¬ 
tion y n +i are implicitly defined by 


F{h,y n+1 ,fi) 


y n +1 - &h{y n + G(y n ) T y) - G(y n+1 ) T n 
g(y n +i) 


= o. 


(4.2) 


Since F((), y n , 0) = 0 and since 


dF 

d{y n +i,fj) 


(0,y n ,0) 


( I -2 G(y n ) T \ 

\G{y n ) 0 ) 


(4.3) 


is invertible (provided that G(y n ) has full rank), an application of the implicit func¬ 
tion theorem proves the existence of the numerical solution for sufficiently small 
step size h. The simple structure of the matrix (4.3) can also be exploited for an 
efficient solution of the nonlinear system (4.2) using simplified Newton iterations. 
If the basic method 4>h is itself implicit, the nonlinear system (4.2) should be solved 
in tandem with y n+1 = @h{yn)- 

Order. For a study of the local error we let y n := y(t n ) be a value on the exact 
solution y(t ) of (4.1). If the basic method 4>h is of order p , i.e., if y(t n + h) — 
^h(y{tn)) = 0(h p+1 ), we have F(h, t/(£ n+ i), 0) = 0(h vJrl ). Compared to (4.2) 
the implicit function theorem yields 
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y n + 1 - y(t n + 1 ) = 0 (h p+l ) and p = 0 (h p+1 ). 

This proves that the symmetric projection method of Algorithm 4.1 has the same 
order as the underlying one-step method 

Symmetry of the Algorithm. Exchanging h —h and y n y n + i in the Algo¬ 

rithm 4.1 yields 

Vn = Dn+1 + G(y n+ 1 ) T p, g(y n + 1 ) = o, 

2/n+ 1 = &-h(y n ), 

Vn = Vn +1 + G(y n ) T y, g(y n ) = 0. 

The auxiliary variables p, p n , and p n+ i can be arbitrarily renamed. If we replace 
them with — y, y n + i, and p n , respectively, we get the formulas of the original al¬ 
gorithm provided that the method <Ph of the intermediate step is symmetric. This 
proves the symmetry of the algorithm. 

Various modifications of the perturbation and projection steps are possible with¬ 
out destroying the symmetry. For example, one can replace the arguments y n and 
y n + i in G(y) with (y n + y n + 1 )/ 2 . It might be advantageous to use a constant direc¬ 
tion, i.e., y n = y n + A T y, y n + i = y n + l + A T y with a constant matrix A. In this 
case the matrix G(y)A T has to be invertible along the solution in order to guarantee 
the existence of the numerical solution. 

Reversibility. From Theorem 1.5 we know that symmetry alone does not imply 
the p-reversibility of the numerical flow. The method must also satisfy the compat¬ 
ibility condition (1.4). It is straightforward to check that this condition is satisfied 
if the integrator <Ph of the intermediate step of Algorithm 4.1 satisfies (1.4) and, in 
addition, 

pG(y) T = G(py) T a (4.4) 

holds with some constant invertible matrix a. In many interesting situations we 
have g(py) = a~ T g(y) with a suitable a, so that (4.4) follows by differentiation 
if pp T = /. Similarly, when a projection with constant direction y = y + A T p is 
applied, the matrix A has to satisfy p A T = A T a for a suitably chosen invertible 
matrix cr (see the experiment of Example 4.4 below). 

Example 4.2. Eet us consider the equations of motion of a rigid body as described 
in Example IV. 1.7. They constitute a differential equation on the manifold 

M = {(2/1, ?/2,2/3) 1 2/1 +yl +2/3 = !}, 

and it is p-reversible with respect to p(z/i ? 2/2 ? 2/3) = (— 2/1 , 2/2 ? 2/3)9 and also with 
respect to p(y 1 , y 2 , 2/3) = (2/1, ~2/2,2/3) and p(yi, y 2 , 2/3) = (2/1,2/2,-2/3)- For a 

numerical simulation we take I\ = 2, I 2 = 1, I 3 = 2/3, and the initial value 
yo = (cos(0.9), 0, sin(0.9)). We apply the trapezoidal rule (II.1.2) with the large 
step size h = 1 in three different versions. 

The upper picture of Fig. 4.2 shows the result of a direct application of the trape¬ 
zoidal rule. The numerical solution lies apparently on a closed curve, but it does not 
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Fig. 4.2. Numerical simulation of the rigid body equations. The three pictures correspond 
to a direct application (upper), to the standard projection (lower left), and to the symmetric 
projection (lower right) of the trapezoidal rule; 5000 steps with step size h — 1 


lie exactly on the manifold M. This can be seen as follows: the trapezoidal rule ^ 
is conjugate to the implicit midpoint rule ^ via a half-step of the explicit Euler 
method Xh/ 2 - In fact the relations 

&h =X*h/ 2 ° Xh /2 and = X h /2 ° X* h/2 

hold, so that 

€ = Xh /2 ° K o Xh /2 and ($Z) N = X ~ h } 2 o (*£T o Xh/ 2 • 

Consequently, the trajectory of the trapezoidal rule is obtained from the trajectory 
of the midpoint rule by a simple change of coordinates. On the other hand, the 
numerical solution of the midpoint rule lies exactly on a solution curve because it 
conserves quadratic invariants (Theorem IV.2.1). 

Using standard orthogonal projection (Algorithm IV.4.2) we obviously obtain a 
numerical solution lying on the manifold M. But as we can see from the lower left 
picture of Fig. 4.2, it does not remain near a closed curve and converges to a fixed 
point. The lower right picture shows that the use of the symmetric orthogonal pro¬ 
jection (Algorithm 4.1) recovers the property of remaining near the closed solution 


curve. 
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Example 4.3 (Numerical Experiment with Constant Direction of Projection). 

We consider the pendulum equation in Cartesian coordinates (see Example IV.4.1), 

Qi=Pi, Pi = ~Qi\ (45) 

Q2=P2, P2 = -l-92A 

with A = (Pi + pi — Q 2 )/{Qi •); c/j). This is a problem on the manifold 

M = {(qi,q 2 ,pi,P 2 ) \q\ + ql = l, qiPi + q 2 P 2 = o}. 

It is p-reversible with respect to p(q±, C/ 2 .P 1 .P 2 ) = (qi,q 2 , —Pi ■ ~P 2 ) and also with 
respect to p(qi,q 2 ,Pi, P 2 ) = (-qi,q 2 ,Pi,~P 2 ). 

We apply two kinds of symmetric projection 
methods. First, we consider an orthogonal projec¬ 
tion onto M as in Algorithm 4.1. Second, we project 
parallel to coordinate axes. More precisely, we fix 
the first components in position and velocity if the 
angle of the pendulum is close to 0 or tt (vertical 
projection in the picture to the right), and we fix the 
second components if the angle is close to ±tt/2 
(horizontal projection). The regions where the di¬ 
rection of projection changes, are overlapping. 

We notice in Fig. 4.3 that for the orthogonal projection method the energy er¬ 
ror remains bounded, and this is also true for integrations over much longer time 
intervals. This is in agreement with the observation of Chap. I, where symmetric 
methods showed an excellent long-time behaviour when applied to reversible dif¬ 
ferential equations. 




Fig. 4.3. Global error in the total energy for two different projection methods - orthogonal 
and coordinate projection - with the trapezoidal rule as basic integrator. Initial values for 
the position are (cos 0.8, — sin 0.8) (left picture) and (cos 0.8, sin 0.8) (right picture); zero 
initial values in the velocity; step sizes are h — 0.1 (solid) and h = 0.05 (thin dashed) 


For the coordinate projection, however, we observe a bounded energy error only 
for the initial value that is close to equilibrium (no change in the direction of the 
projection is necessary). As soon as the direction has to be changed (right picture 
of Fig. 4.3) a linear drift in the energy error becomes visible. Hence, care has to be 
taken with the choice of the projection. For an explanation of this phenomenon we 
refer to Chap. IX on backward error analysis and to Chap. XI on perturbation theory 
of reversible mappings. 
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.002 

.001 

.000 

Fig. 4.4. Global error in the total energy for a symmetric projection method violating (1.4). 
Initial values for the position are (cos 0.8, — sin 0.8) and (0, 0) for the velocity; step sizes 
are h — 0.1 (solid) and h = 0.05 (thin dashed) 



Example 4.4 (A Symmetric but Non-Reversible Projection Method). We con¬ 
sider the pendulum equation as in Example 4.3. This time, however, we apply a 
projection y n = y n + A T y, y n+1 = y n+ i + A T y, with 


A = 



0 0 \ 

0 lj ’ 


e = 0.2. 


For 5 = 0 this corresponds to the vertical projection used in Example 4.3. For 
5^0 there is no matrix a such that p A T = A T cr holds for one of the mappings 
p that make the problem p-reversible. Hence condition (1.4) is violated, and the 
method is thus not p-reversible. The initial values are chosen such that g'(y)A T is 
invertible and well-conditioned along the solution. Although the projection direction 
need not be changed during the integration and the method is symmetric, the long¬ 
time behaviour is disappointing as shown in Fig. 4.4. This experiment illustrates that 
condition (1.4) is also important for a qualitatively correct simulation. 


V.4.2 Symmetric Methods Based on Local Coordinates 

Numerical methods for differential equations on manifolds that are based on local 
coordinates (Algorithm IV.5.3) are in general not symmetric. For example, if we 
consider the parametrization (IV.5.8) with respect to the tangent space at yo, the ad¬ 
joint method would be parametrized by the tangent space at y\. We can circumvent 
this difficulty by the following algorithm (Hairer 2001). 

Algorithm 4.5 (Symmetric Local Coordinates Approach). Assume that y n G M 
and that fi> a is a local parametrization ofM. satisfying fi> a {0) = a (close to y n ). 
One step y n i—» y n +\ is defined as follows (see Fig. 4.5): 

• find z n (close to 0) such that fi> a (z n ) = y n ; 

• z n +i = <Ph{z n ) (symmetric one-step method applied to (IV.5.7); 

• Vn+1 = i>a(Zn+1 )/ 

• choose a in the parametrization such that z n + z n + 1 = 0 . 

It is important to remark that the parametrization y = is in general changed 

in every step. 
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Fig. 4.5. Symmetric use of local tangent space parametrization 


This algorithm is illustrated in Fig. 4.5 for the tangent space parametrization 
(IV.5. 8 ), given by 

i>a(z) =a + Q(a)z + g'(a) T u a (z), (4.6) 

where the columns of Q(a) form an orthogonal basis of T a M and the function 
u a (z) is such that 'ipa(z) £ Ai. It satisfies u a (0) = 0 and u' a ( 0) = 0. 

Existence of the Numerical Solution. In Algorithm 4.5 the values a G Ai and z n 
are implicitly determined by 


F(h, Zm &) 


/ z n + \ ^ 

V cl (z n ) Vn ) 


(4.7) 


and the numerical solution is then explicitly given by y n +i = 'ipa(@h(z n )) • For 
more clarity we also use here the notation ^{z, a) = ip a (z)- If the parametrization 
ip(z, a) is differentiable, we have 


OF , . ( 2/ 0 \ 

FFa)^ ,0,yn) ~ \^(0,y n ) ^(0 ,y n ))‘ ( } 

Since 2 p(z^a) G Ai for all 2 ; and a G M, the derivative with respect to a lies 
in the tangent space. Assume now that the parametrization ^(z, a) is such that the 
restriction of §^(0, y n ) onto the tangent space T Vn M is bijective. Then, the matrix 
(4.8) is invertible on R d X T Vn Ai (d denotes the dimension of the manifold). The 
implicit function theorem thus proves the existence of a numerical solution (z n , a) 
close to (0, y n ). In the case where ^ a (z) is given by (4.6), the matrix 

t^(0 ,a) = I - g'(a) T (g , (a)g'(a) T )~ 1 g'(a) 

is a projection onto the tangent space T a Ai and satisfies the above assumptions 
provided that g'(a) has full rank. 

Order. We let y n := y(t n ) be a value on the exact solution y{t) of (4.1). Then we 
fix a G Ai as follows: we replace the upper part of the definition (4.7) of F(h, z n , a) 
with z n + Lp y h \ z n ), where ; denotes the exact flow of the differential equation 
for z(t) equivalent to (4.1). The above considerations show that such an a exists; 
let us call it a*. If <&h is of order p , we then have F(h, z(t n ), a*) = 0(h vJrl ). 
An application of the implicit function theorem thus gives z n — z(t n ) = 0{h pJrl ), 
implying z n +i — z(t n +i) = 0(h p+1 ), and finally also y n+ i -y(t n+1 ) = 0(h p+1 )- 
This proves order p for the method defined by Algorithm 4.5. 
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Symmetry. Exchanging h —h and y n y n + 1 in Algorithm 4.5 yields 

'lpa(Zn) = 2/n+l? ^n+1 = ^—h^n)-> Un = ^Pa{^n-\- 1)5 H - ^n+1 == 0* 

If we also exchange the auxiliary variables z n and z n + 1 and if we use the symmetry 
of the basic method $h, we regain the original formulas. This proves the symmetry 
of the algorithm. Again various kinds of modifications are possible. For example, 
the condition z n + z n + j = 0 can be replaced with z n + z n +\ = x(/i, z n , z n +i). If 
x(—h,v,u) = x{h,u,v), the symmetry of Algorithm 4.5 is not destroyed. 

Reversibility. In general, we cannot expect the method of Algorithm 4.5 to satisfy 
the ^-compatibility condition (1.4), which is needed for p-reversibility. However, if 
the parametrization is such that 

P'ipa(z) = ^ pa {crz) for some invertible a, (4.9) 

we shall show that the compatibility condition (1.4) holds. We first prove that 
for a p-reversible problem y = f(y) the differential equation (IV.5.7), written as 
i = F a (z ), is cr-reversible in the sense that crF a (z) = —F pa (crz). This follows 
from P'ipa(z) = ^' pa ((jz)cF (which is seen by differentiation of (4.9)) and from 
f(ip pa (vz)) = *rpf(tp a (z)), because 

1 p' a (z) F a(z) = f(lpa(z)) => 1 p' pa (crz)aF a (z) = - f (lp pa (az)) . 

If the basic method Fh satisfies ao^ = F_ h o cr when applied to z = F a (z) (e.g., 
for all Runge-Kutta methods), the formulas of Algorithm 4.5 satisfy 


PVn = Pi>a(z n ) = 'IppaivZn), CTZ n+1 = &- h {(T Z n ), 

fp P a(crz n+1 ) = pip a (z n+1 ) = py n+ 1 , az n + <JZ n+ 1 = 0. 

This proves that, starting with py n and a negative step size —h, the Algorithm 4.5 
produces py n + i, where y n + i is just the result obtained with initial value y n and 
step size h. But this is nothing other than the p-compatibility condition (1.4) for 
Algorithm 4.5. 

In order to verify condition (4.9) for the tangent space parametrization (4.6), we 
write it as ^ a {Z) = a + Z + N(Z), where Z is an arbitrary element of the tan¬ 
gent space T a M and N(Z) is orthogonal to T a M such that 'ipa(Z) e M. Since 
pT a M = T pa M and since, for a p satisfying pp T = /, the vector pN(Z) is or¬ 
thogonal to T pa M , we have p^p a (Z) = 2 p pa (pZ ). This proves (4.9) for the tangent 
space parametrization of a manifold. 

Example 4.6. We repeated the experiment of Example 4.2 with Algorithm IV.5.3, 
using tangent space parametrization and the trapezoidal rule as basic integrator, and 
compared it to the symmetrized version of Algorithm 4.5. We were surprised to see 
that both algorithms worked equally well and gave a numerical solution lying near 
a closed curve. An explanation is given in Exercise 11. There it is shown that for the 
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Fig. 4.6. Numerical simulation of the rigid body equations; standard use of tangent space 
parametrization with the trapezoidal rule as basic method (left picture) and its symmetrized 
version (right picture); 5000 steps with step size h = 0.4 


special situation where Ad is a sphere, the standard algorithm is also symmetric for 
the trapezoidal rule. Let us therefore modify the problem slightly. 

We consider the rigid body equations (IV. 1.4) as a differential equation on the 
manifold 


M = {(2/1, 2/2,2/3) 


yl + yl + yl 

h h h 


Const 


(4.10) 


with parameters and initial data as in Example 4.2, and we apply the standard and the 
symmetrized method based on tangent space parametrization. The result is shown 
in Fig. 4.6. In both cases the numerical solution lies on the manifold (by definition 
of the method), but only the symmetric method has a correct long-time behaviour. 


Symmetric Lie Group Methods. We turn our attention to particular problems 

Y = A(Y)Y, Y(0)=Y o , (4.11) 


where A(Y) is in the Lie algebra Q whenever Y is in the corresponding Lie group 
G. The exact solution then evolves on the manifold G. Munthe-Kaas methods 
(Sect. IV.8.2) are in general not symmetric, even if the underlying Runge-Kutta 
method is symmetric. This is due to the unsymmetric use of the local coordinates 
Y = exp(!?)Yo- However, accidentally, the Lie group method based on the implicit 
midpoint rule 

Y n+1 = exp(f2)Y n , f2 = hA(exp(f2/2)Y n ) (4.12) 

is symmetric. This can be seen as usual by exchanging h —h and Y n Y n +$ 

(and also Q — Q for the auxiliary variable). Numerical computations with the 
rigid body equations (considered as a problem on the sphere) shows an excellent 
long-time behaviour for this method similar to that of the right picture in Fig. 4.6. In 
contrast to the implicit midpoint rule (1.1.7), the numerical solution of (4.12) does 
not lie exactly on the ellipsoid (4.10); see Exercise 12. 
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For the construction of further symmetric Lie group methods we can apply the 
ideas of Algorithm 4.5. As local parametrization we choose 

xpu(ft) = exp (ft)U, (4.13) 

where U = exp((9)F n plays the role of the midpoint on the manifold. We put Z n = 
—O so that ipu(Z n ) = Y n . With this starting value Z n we apply any symmetric 
Runge-Kutta method to the differential equation 

n = A(ii> u (n)) + ^^ a d k a (A^ u (n))), n(o) = -e, (4.i4) 

(cf. (IV.8.9)) and thus obtain Z n+1 . According to Algorithm 4.5, O is implicitly 
determined by the condition Z n + Z n+1 = 0, and the numerical approximation is 
obtained from 

Y n +1 = ipu(z n+1 ) = exp(Z„ + i) exp(6>)F n = exp(26>)F n . 

The method obtained in this way is identical to Algorithm 2 of Zanna, Engp & 
Munthe-Kaas (2001). With the coefficients of the 2-stage Gauss method (Table 
II. 1.1) and with q = 1 in (4.14) we thus get 

(h = - \[(i 2 ,A 2 \), , A x ]) 

Yn +1 = exp(26>)y n = exp ^(A 1 + A 2 ) - + [f2 2 ,A 2 ]))Y n , 

where A{ = A(exp(i?i) exp(@)V n ). This is a symmetric Lie group method of order 
four. We can reduce the number of commutators by replacing fii in the right-hand 
expression with its dominating term. This yields 

fa = -h^A 2 + I ^[A 1 ,A 2 ], tt 2 = h^A 1 -’^[A 1 ,A 2 ] 

(4.15) 

Y n +i = exp ^(A 1 +A 2 )-h 2 ^[A 1 ,A 2 ]y n 

(cf. Exercise IV. 19). Although we have neglected terms of size 0(h A ), the method 
remains of order four, because the order of symmetric methods is always even. 

For any linear invertible transformation p , the parametrization (4.13) satisfies 

pipui^) = p exp (f2)U = exp (pf2p~ 1 )pU = 'ippu(vU) 

with crft = pftp- 1 . Hence (4.9) holds true. If the problem (4.11) is p-reversible, i.e., 
pA(Y ) = — A(pY)p , then the truncated differential equation (4.14) is cr-reversible 
for all choices of the truncation index q. Moreover, after the simplifications that lead 
to method (4.15), the p-compatibility condition (1.4) is also satisfied. 
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The following variant is also proposed in Zanna, Eng0 & Munthe-Kaas (2001). 
Instead of computing (9 from the relation Z n -f Z n + 1 = 0, O is determined by 

s 

Zn + ^n+i = h ei ^Ai — - Aj\ + .. . 

i=i 

If the coefficients satisfy = —e*, this modification gives symmetric Lie 

group methods. 


V.5 Energy - Momentum Methods and Discrete 
Gradients 


Conventional numerical methods, when applied to the ordinary differen¬ 
tial equations of motion of classical mechanics, conserve the total energy 
and angular momentum only to the order of the truncation error. Since 
these constants of motion play a central role in mechanics, it is a great 
advantage to be able to conserve them exactly. 

(R.A. LaBudde & D. Greenspan 1976) 

This section is concerned with numerical integrators for the equations of motion 
of classical mechanics which conserve both the total energy and angular momen¬ 
tum. Their construction is related to the concept of discrete gradients. The meth¬ 
ods considered are symmetric, which is incidental but useful: in our view their 
good long-time behaviour is a consequence of their symmetry (and reversibility) 
more than of their exact conservation properties; see the disappointing behaviour of 
the non-symmetric energy- and momentum-conserving projection method in Exam¬ 
ple IV.4.4. 

A Modified Midpoint Rule. Consider first a single particle of mass m in M 3 , 
with position coordinates q(t) G M 3 , moving in a central force field with potential 
U(q) = V(\\q\\) (e.g., V(r) = — 1/r in the Kepler problem). With the momenta 
p(t) = m q(t ), the equations of motion read 

Constants of motion are the total energy H = T{p) + U(q), with T(p) = 
||p|| 2 /(2m), and the angular momentum L = q x p: 

4(<? xp) = qxp + qxp = ~ P * P ~ V'(\\q\\) -Z q x q = 0 . 
at m \\q\\ 

We know from Sect. IV.2 that the implicit midpoint rule conserves the quadratic 
invariant L = qxp, and Theorem IV.2.4 (or a simple direct calculation) shows that 
L remains actually conserved by any modification of the form 
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Qn+1 — Qn H- Pn+ 1/2 . t Pn+1/2 2 (Pn+.Pn+l) 

m ' with (5.1) 

Pn+ 1 = Pn~ KhVU(q n+1 / 2 ) Qn+1/2 = \{<ln + Qn+ 1) 

where k is an arbitrary real number. Simo, Tamow & Wong (1992) introduce 
this additional parameter k and determine it so that the total energy is conserved: 

H(p n +i,q n +i) = H(p n ,q n ). With the notation F n+1/2 = ~'VU(q n+1/2 ) = 

-F'(lkn+1/2||)/Ikn+1/2|| ■ Qn+i/2 we have 


T{p n + 1) — T(p n + nhF n+ 1 / 2 ) — T{p n ) + ~Pn+l/ 2 Fn+1/2 > 

and hence the condition for conservation of the total energy H = T + U becomes 

K ^ P^+1/2 -^+1/2 = U ( q n ) - U (<?„+1) . 

This gives a reasonable method even if p^ +1 / 2 ^n+ 1/2 is arbitrarily close to zero. 
This is seen as follows: let a = ~kV' (\\q n+ i/ 2 \\)/\\q n +%/ 2 \\ so that ftF n+1/2 = 
(J Qn+ 1 / 2 - The above condition for energy conservation then reads 

a ~Pn+l/ 2 <ln+l /2 = V(||^||) - V(\\q n+1 1|) , 

where we note further that 

^ Pn+ 1/2 Qn+i /2 = (q n +1 - q n ) T \(q n +i + q n ) 

= Uhn+lf - hull 2 ) = (\\q n+ ill - Ikll) Ukn+lW + ||«n||) • 


These formulas give 

_ V(l|gn+l||)-V(|kn||) 1 

lkn+l|| — \\<ln\\ \ (||^n+l || + ||fe|J) 

with which method (5.1) becomes 

h 

Qn +1 = Qn H- Pn+1/2 

m ' 

n _ „ h V(\\Qn+l\\)-V(\\Qn\\) Qn+ 1/2 

Pn+1 Pn Ikn+lll-M Uhn+lW + hnW) ‘ 


(5.2) 


(5.3) 


This is a second-order symmetric method which conserves the total energy and 
the angular momentum. It evaluates only the potential U(q) = V(||g||). The force 
—Vf/(g) = — y'dl^H) |^| is approximated by finite differences. 

The energy- and momentum-conserving method (5.3) first appeared in LaBudde 
& Greenspan (1974). The method (5.1) or (5.3) is the starting point for extensions 
in several directions to other problems of mechanics and other methods; see Simo, 
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Tarnow & Wong (1992), Simo & Tamow (1992), Lewis & Simo (1994, 1996), Gon¬ 
zalez & Simo (1996), Gonzalez (1996), and Reich (1996b). In the following we 
consider a direct generalization to systems of particles, also given in LaBudde & 
Greenspan (1974). 

An Energy-Momentum Method for N-Body Systems. We consider a system of 
N particles interacting pairwise with potential forces which depend on the distances 
between the particles. As in Example IV. 1.3, this is formulated as a Hamiltonian 
system with total energy 


H(p,q) 


AT N i—1 

fivi + 51 y ij (lift - v n) • 

i=1 1 1=2 7 = 1 


(5.4) 


As an extension of method (5.3), we consider the following method (where we now 
write the time step number as a superscript for notational convenience) 


ri+l _ n 


n , h n+1/2 

= q- 4- p- ' 

mi 
N 


P 7 +1 =P7 + hJ2^(^ 


n+1/2 n+1/2 \ 


3 = 1 


(5.5) 


where p” +1 ^ 2 = 


4(p?+p? +i ).«r +1/2 = 


Wi + + +1 )» and for i > h 


®ij Oji 


+(+ +1 )-+(+) i 

n+i _ n l( r n i r n+l\ 

ij ij 2 v ij ' ij ) 


(5.6) 


with = Wq™ — q™ ||, and a a =0. This method has the following properties. 

Theorem 5.1 (LaBudde & Greenspan 1974). The method (5.5) with (5.6) is a 
second-order symmetric implicit method which conserves the total linear momen¬ 
tum P = Pi, the total angular momentum L = qi x p^ and the total 

energy H. 


Proof. A comparison of (5.6) with the equations of motion shows that the method 
is of order 2. Similar to the continuous case (Example IV. 1.3), the conservation 
of linear and angular momentum is obtained as a consequence of the symmetry 
dij = crji for all i, j. For the linear momentum we have 

N N N N N 

E+ +1 = X>?+* £ X>« + n+1/2 - < +1/2 ) = J2 Pi- 

i= 1 i=1 i=l j=1 i= 1 

For the proof of the conservation of the angular momentum we observe that the 
first equation of (5.5) together with p” +1 ^ 2 = +r'' + l + p”) yields 

(+ •' f/ " ) x (p” +1 - +) =0 


(5.7) 
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for all i. The second equation of (5.5) together with g™ +1 ^ 2 = + gf) gives 

N 

E (q? +1 + q?) x (P? +1 ~P?)= 0, (5.8) 

because cr y = <7 y , and therefore y]V_t 17 u ( l'i + ^‘ 2 x Qj +1 ^ 2 = 0. Adding the sum 
over i of (5.7) to the equation (5.8) proves the statement gf +1 x p™ +1 = 

It remains to show the energy conservation. Now, the kinetic energy T(p) = 

I Ei=l m i 1 pfpi at ste P n + 1 is 

N , JV 

2=1 J=1 

AT AT 

= T(P n ) + EE>« fe " +1 -Qi) T (q? + 1 / 2 -q; +1/2 ) • 

2=1 .7 = 1 

Using once more the symmetry = crji , the double sum reduces to 

5 EX>« ((«-” +1 - «>“ +1 ) - («? - 9”))1((«.“ +1 - «? +1 ) + («? - «”>) 

2=1 7 = 1 

AT 2-1 

“EE^KW'^-M) 2 )- 

2=2 j = l 

On the other hand, the change in the potential energy is 

N 2-1 

U(q n+1 ) - U(q n ) = EE(^'( r £ +1 ) " v ij( r ?j)) > 

2=2 7 = 1 

and hence (5.6) yields the conservation of the total energy H = T + U. □ 

Discrete-Gradient Methods. The methods (5.3) and (5.5) are of the form 

Vn -hi = Vn H - hB(jJ n -\-l , Un) ^-^(Z/n+1 5 Vn) (5-9) 

where £?(?/, y) is a skew-symmetric matrix for all $/, y , and VH(y, y) is a discrete 
gradient of H, that is, a continuous function of (y, y) satisfying 

VH(y, y) T {y - y) = H(y ) - H(y) 

WH(y,y) = VH(y) . 

The symmetry of the methods is seen from the properties B(y, y) = B(y,y) and 
\7H(y,y) = 'VH(y,y). For example, for method (5.3) we have, with y = (p,q) 
and y= (p,q), 
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B{y,y) 



and S7H(y,y) 


f W+p) ) 

\a(q,q) \{q + q)) 


where cr(q, q) is given by the expression (5.2) with (g, q) in place of (g n+ i, q n ) or 
by the corresponding limit as ||g|| —► ||g||. 

The discrete-gradient method (5.9) is consistent with the differential equation 


V = B(y)VH(y) (5.11) 

with the skew-symmetric matrix B(y) = B(y , y). This system conserves H , since 

j t H(y) = VH (yfy = WH(y) T B(y ) VH(y) = 0 , 

and, as was noted by Gonzalez (1996) and McLachlan, Quispel & Robidoux (1999), 
H is also conserved by method (5.9). 

Theorem 5.2. The discrete-gradient method (5.9) conserves the invariant H of the 
system (5.11). 

Proof. The definitions (5.10) of a discrete gradient and of the method (5.9) give 

H(y n -\-i) B(y n ) = SJH(y n j r \^y n ) (y n ~|-i y n ) 

= h WH(y n ^i , y n ) B(y n j r \ 1 y n ) \7H(y n +i, y n ) = 0 , 

where the last equality follows from the skew-symmetry of B(y n + 1 , y n ). □ 

Example 5.3. The Lotka-Volterra system (1.1.1) can be written as 

(")=C 


with the invariant H(u,v) = Inu — u + 2 lnu — v of (1.1.4). Possible choices of a 
discrete gradient are the coordinate increment discrete gradient (Itoh & Abe 1988) 


/ H(u , v) — H(u , v) 


VH(u , v; u , v) = 


V 


u — u 
H(u , v) — H(u , v) 


(5.12) 


and the midpoint discrete gradient (Gonzalez 1996) 

VH { y, y) = VH { y ) + ^M^im^ Ay 


(5.13) 


with y = \(y + y) and Ay = y — y. In contrast to (5.12), the discrete gradient 
(5.13) yields a symmetric discretization. 


A systematic study of discrete-gradient methods is given in Gonzalez (1996) 
and McLachlan, Quispel & Robidoux (1999). 
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V.6 Exercises 

1. Prove that (after a suitable permutation of the stages) the condition c s+ i_* = 
1 — Ci (for all i) is also necessary for a collocation method to be symmetric. 

2. Prove that explicit Runge-Kutta methods cannot be symmetric. 

Hint. If a one-step method applied to y = Xy yields y\ = R(h\)yo then, a 
necessary condition for the symmetry of the method is R(z)R(—z) = 1 for all 
complex z. 

3. Consider an irreducible diagonally implicit Runge-Kutta method (irreducible 
in the sense of Sect. VI.7.3). Prove that the condition (2.4) is necessary for the 
symmetry of the method. No permutation of the stages has to be performed. 

4. Let <&h = (p^ o where ipfi represents the exact flow of y = /M (y). In the 
situation of Theorem III.3.17 prove that the local error (3.4) of the composition 
method (3.3) has the form 

h3 (^ 6a ~ ^ 2 , A]] + ^(1 - 6a + 6a 2 ) [D l5 [D i ,D 2 ]])ld{y), 

where, as usual, Dig(y ) = g'(y)f^(y)- The value a = 0.1932 is found by 
minimizing the expression (6ck — l) 2 +4(1 — 6a + 6a 2 ) 2 (McLachlan 1995). 

5. For the linear transformation p(p, q) = (—p, q), consider a p-reversible problem 
(1.3) with scalar p and q. Prove that every solution which crosses the q -axis 
twice is periodic. 

6. Prove that if a numerical method conserves quadratic invariants (IV.2.1), then 
so does its adjoint. 

7. For the numerical solution of y = A(t)y consider the method y n i—> y n +i 
defined by y n +i = z(t n + h), where z(t) is the solution of 

Z = A(t)z , z(tn) 1 Vm 

and A(t) is the interpolation polynomial based on symmetric nodes ci,..., c s , 
i.e., c s+ i_^ + Ci = 1 for all i. 

a) Prove that this method is symmetric. 

b) Show that y n +i = exp (f2(h))y n holds, where f2(h) has an expansion in odd 
powers of h. This justifies the omission of the terms involving triple integrals 
in Example IV.7.4. 

8. If <I>h stands for the implicit midpoint rule, what are the Runge-Kutta coeffi¬ 
cients of the composition method (3.8)7 The general theory of Sect. III. 1 gives 
three order conditions for order 4 (those for the trees of order 2 and 4 are auto¬ 
matically satisfied by the symmetry of the method). Are they compatible with 
the two conditions of Example 3.5? 

9. Make a numerical comparison of our favourite composition methods p6 s9, 
pS 817, and plO s35 for the Lorenz problem 

Vi = ~v(yt ~ V2) 2/1 (0) = 10 a S? 10 

2/2 = “2/12/3 + r Vi ~ 2/2 2 / 2 ( 0 ) = -20 r = 28 

2/3 = 2/12/2 - by 3 2/3 (0) = 20 b = 8/3 


( 6 . 1 ) 
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Fig. 6.1. Comparison of various composition methods applied to the Lorenz equations 


with exact solution 

2/i (1) = 8.635692709892506017930544628639 

2/ 2 (1) = 2.798663387927457052023080059065 (6.2) 

2/s(l) = 33.36063508973142157789185846267 


by composing for 0 < t < 1 the second order symmetric splitting scheme (see 
Kahan & Li 1997) which, for the time-stepping 2 /i •—> T*, is given by 

Yi-yi = ^(-v(yi + Y 1 -y 2 -Y 2 )) 

Yi-V 2 = \ (~yiY 3 - Y m + r, h + rY 1 - y 2 - Y 2 ) (6.3) 

Y 3 -y 3 = \ (; yiY 2 + Y m - by 3 - bY 3 ). 


This method requires, for each step, the solution of a linear system only. The 
results are shown in Fig. 6.1. 

10. Symmetrized order conditions (Suzuki 1992). Prove that for methods (3.8) of 
order four with 7 \ satisfying (3.10) 


s fe 2 

=0 

h—^ f= 1 


h= 1 f= 1 f= h 


The prime after (before) a sum sign indicates that the term with highest (low¬ 
est) index is divided by 2. Prove also that the order conditions given in Suzuki 
(1992) for order p < 8 are equivalent to those of Example 3.5. Is this also true 
for order p = 10 ? 

Hint. Use relations like = 1 '52e=k It • 

11. Let M = {( 2 / 1 ,2/2? 2 / 3 ) 1 2/1 H- 2/2 + = 1} * and consider for a G A4 the 

tangent space parametrization 


^a(z) = CL + 2 + au a (z ), 
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where, for 2 G T a M , the real value u a (z) is determined by the requirement 
ip a (g) £ Ad. Prove that Algorithm IV.5.3, with the trapezoidal rule in the role 
of @h, is a symmetric method. 

Hint. Since z is a linear combination of a and ^ a (z), it is uniquely determined 
by a T z (which is zero) and r ip a (z) T z. 

12. (Zanna, Eng0 & Munthe-Kaas 2001). Verify numerically that the Lie group 
method (4.12) based on the implicit midpoint rule does not conserve general 
quadratic first integrals. One can consider the rigid body equations in the form 
(IV. 1.5). 



Chapter VI. 

Symplectic Integration of Hamiltonian 
Systems 



Fig. 0.1. Sir William Rowan Hamilton, born: 4 August 1805 in Dublin, died: 2 September 
1865. Famous for research in optics, mechanics, and for the invention of quaternions 


Hamiltonian systems form the most important class of ordinary differential equa¬ 
tions in the context of ‘Geometric Numerical Integration’. An outstanding property 
of these systems is the symplecticity of the flow. As indicated in the following dia¬ 
gram, 

Ordinary Differential Equations 
of motion canonical 

(Lagrange) (Hamilton) 


Variational Principles 
(Lagrange, Hamilton) 


1 st order Partial DE 
Generating Lunctions 
(Hamilton-J acobi) 


Hamiltonian theory operates in three different domains (equations of motion, partial 
differential equations and variational principles) which are all interconnected. Each 
of these viewpoints, which we will study one after the other, leads to the construction 
of methods preserving the symplecticity. 
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VI. 1 Hamiltonian Systems 

Hamilton’s equations appeared first, among thousands of other formulas, and in¬ 
spired by previous research in optics, in Hamilton (1834). Their importance was im¬ 
mediately recognized by Jacobi, who stressed and extended the fundamental ideas, 
so that, a couple of years later, all the long history of research of Galilei, Newton, 
Euler and Lagrange, was, in the words of Jacobi (1842), “to be considered as an 
introduction”. The next mile-stones in the exposition of the theory were the monu¬ 
mental three volumes of Poincare (1892,1893,1899) on celestial mechanics, Siegel’s 
“Lectures on Celestial Mechanics” (1956), English enlarged edition by Siegel & 
Moser (1971), and the influential book of V.I. Arnold (1989; first Russian edition 
1974). Beyond that, Hamiltonian systems became fundamental in many branches of 
physics. One such area, the dynamics of particle accelerators, actually motivated the 
construction of the first symplectic integrators (Ruth 1983). 

VI.1.1 Lagrange’s Equations 

Equations differentielles pour la solution de tous les problemes de Dy- 
namique. (J.-L. Lagrange 1788) 

The problem of computing the dynamics 
of general mechanical systems began with 
Galilei (published 1638) and Newton’s Prin- 
cipia (1687). The latter allowed one to reduce 
the movement of free mass points (the “mass 
points” being such planets as Mars or Jupiter) 
to the solution of differential equations (see 
Sect. 1.2). But the movement of more com¬ 
plicated systems such as rigid bodies or bod¬ 
ies attached to each other by rods or springs, 
were the subject of long and difficult devel¬ 
opments, until Lagrange (1760, 1788) found 
an elegant way of treating such problems in 
general. 

We suppose that the position of a mechan¬ 
ical system with d degrees of freedom is de¬ 
scribed by q = (gi,..., g^) T as generalized 
coordinates (this can be for example Cartesian coordinates, angles, arc lengths along 
a curve, etc.). The theory is then built upon two pillars, namely an expression 

T = T(q,q) (1.1) 

which represents the kinetic energy (and which is often of the form \q T M(q)q 
where M(q) is symmetric and positive definite), and by a function 

1 Joseph-Louis Lagrange, born: 25 January 1736 in Turin, Sardinia-Piedmont (now Italy), 
died: 10 April 1813 in Paris. 
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U = U(q) (1.2) 

representing the potential energy. Then, after denoting by 

L = T — U (1.3) 


the corresponding Lagrangian , the coordinates q\ (£),..., qd(f) obey the differential 
equations 


d fdL\ _ dL 
dt\dq) dq 


(1.4) 


which constitute the Lagrange equations of the system. A numerical (or analytical) 
integration of these equations allows one to predict the motion of any such system 
from given initial values (“Ce sont ces equations qui serviront a determiner la courbe 
decrite par le corps M et sa vitesse a chaque instant”; Lagrange 1760, p. 369). 


Example 1.1. For a mass point of mass m in M 3 with Cartesian coordinates x = 
(#i, X 2 , xs) T we have T(x) = m(x\ + x\ + x 2 )/2. We suppose the point to move 
in a conservative force field F(x) = —VU(x). Then, the Lagrange equations (1.4) 
become mx = F(x), which is Newton’s second law. The equations (1.2.2) for the 
planetary motion are precisely of this form. 


Example 1.2 (Pendulum). For the mathematical pendulum of Sect. 1.1 we take the 
angle a as coordinate. The kinetic and potential energies are given by T = m(x 2 + 
y 2 )/2 = m£ 2 a 2 /2 and U = mgy = —mg£ cos a, respectively, so that the Lagrange 
equations become —mg£ sin a — mi 2 a = 0 or equivalently a + f sin a = 0. 


VI. 1.2 Hamilton’s Canonical Equations 

An diese Hamiltonsche Form der Differentialgleichungen werden die 
ferneren Untersuchungen, welche den Kern dieser Vorlesung bilden, 
ankniipfen; das Bisherige ist als Einleitung dazu anzusehen. 

(C.G.J. Jacobi 1842, p. 143) 


Hamilton (1834) simplified the structure of Lagrange’s equations and turned them 
into a form that has remarkable symmetry, by 

• introducing Poisson’s variables, the conjugate momenta 


9L 


for k = 1,... ,d. 


(1.5) 


• considering the Hamiltonian 


H :=p T q- L(q,q) 


( 1 . 6 ) 


as a function of p and q , i.e., taking H = H(p, q) obtained by expressing gas a 
function of p and q via (1.5). 

Here it is, of course, required that (1.5) defines, for every q , a continuously differ¬ 
entiable bijection q p. This map is called the Legendre transform. 
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Theorem 1.3. Lagrange’s equations (1.4) are equivalent to Hamilton’s equations 


dH ( \ 
Pk = -^—{p,q), 
dq k 


■ dH , \ 

Qk = ■w—{p,q), 
dpk 


,d. 


(1.7) 


Proof. The definitions (1.5) and (1.6) for the momenta p and for the Hamiltonian H 
imply that 


dH 

•T i t dq 

dL dq 

dp 

= q +P - 

op 

dq dp 

dH 

T dq dL 

dL dq 

dq 

P dq~~d^ 

dq dq 


dL 
dq ’ 


The Lagrange equations (1.4) are therefore equivalent to (1.7). 


□ 


Case of Quadratic T. In the case that T = \q r M(q)q is quadratic, where M(q) 
is a symmetric and positive definite matrix, we have, for a fixed q, p = M(q)q , so 
that the existence of the Legendre transform is established. Further, by replacing the 
variable q by M(q)~ x p in the definition (1.6) of H (p, q), we obtain 

H(p,q) = p T M(q)- 1 p-L(q,M(q)- 1 p) 

= p T M(q)~ 1 p-^p T M(q)~ 1 p + U(q) = ]^p T M{q)~ 1 p + U{q) 

and the Hamiltonian is H = T + U, which is the total energy of the mechanical 
system. 

In Chap. I we have seen several examples of Hamiltonian systems, e.g., the pen¬ 
dulum (1.1.13), the Kepler problem (1.2.2), the outer solar system (1.2.12), etc. In the 
following we consider Hamiltonian systems (1.7) where the Hamiltonian H (p, q) is 
arbitrary, and so not necessarily related to a mechanical problem. 


VI.2 Symplectic Transformations 

The name “complex group” formerly advocated by me in allusion to line 
complexes, ... has become more and more embarrassing through colli¬ 
sion with the word “complex” in the connotation of complex number. I 
therefore propose to replace it by the Greek adjective “symplectic.” 

(H. Weyl (1939), p. 165) 

A first property of Hamiltonian systems, already seen in Example 1.2 of Sect. IV. 1, 
is that the Hamiltonian H(p, q) is & first integral of the system (1.7). In this section 
we shall study another important property - the symplecticity of its flow. The basic 
objects to be studied are two-dimensional parallelograms lying in M? d . We suppose 
the parallelogram to be spanned by two vectors 



in the (p, q) space (£ p , ^ q ,r] p , rf are in M d ) as 



VI.2 Symplectic Transformations 183 


P = {*£ + ST] I 0 < t < 1, 0 < s < 1}. 

In the case d = 1 we consider the oriented area 

or.area (P) = det ) = ( 2 - J ) 

(see left picture of Fig. 2.1). In higher dimensions, we replace this by the sum of the 
oriented areas of the projections of P onto the coordinate planes (p^, qf), i.e., by 


v) ■= det { % 5 ) = ~ ( 2 - 2 ) 

i =i ^ 2 i=i 

This defines a bilinear map acting on vectors of M 2d , which will play a central role 
for Hamiltonian systems. In matrix notation, this map has the form 

u(Cv)=t. T J'n with J = (-I o) (2 ' 3) 


where / is the identity matrix of dimension d. 

Definition 2.1. A linear mapping A : M. 2d —► M 2d is called symplectic if 

A T JA = J 


or, equivalently, if cj(A£, At?) pw((,r/) for all g G M 2d . 



Fig. 2.1. Symplecticity (area preservation) of a linear mapping 


In the case d = 1, where the expression u(€,rj) represents the area of the paral¬ 
lelogram P, symplecticity of a linear mapping A is therefore the area preservation 
of A (see Fig. 2.1). In the general case (d > 1), symplecticity means that the sum 
of the oriented areas of the projections of P onto (p^, qf) is the same as that for the 
transformed parallelograms A(P). 

We now turn our attention to nonlinear mappings. Differentiable functions can 
be locally approximated by linear mappings. This justifies the following definition. 

Definition 2.2. A differentiable map g : U — ► M 2d (where U C M 2d is an open set) 
is called symplectic if the Jacobian matrix p'(p, q) is everywhere symplectic, i.e., if 

g'{p,q) T Jg'(p,q) = J or u(g'(j>, q)C g\p, q)v) #= t]). 

Let us give a geometric interpretation of symplecticity for nonlinear mappings. 
Consider a 2-dimensional sub-manifold M of the 2d-dimensional set P, and sup¬ 
pose that it is given as the image M = f>(K) of a compact set K C M 2 , where 
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^(s, t) is a continuously differentiable function. The manifold M can then be con¬ 
sidered as the limit of a union of small parallelograms spanned by the vectors 

dip 

-—(sfi)ds and -—(sfi)dt. 
os at 

For one such parallelogram we consider (as above) the sum over the oriented areas 
of its projections onto the qf) plane. We then sum over all parallelograms of the 
manifold. In the limit this gives the expression 

M ) = jJ K U {j¥s( S,t ^ dsdt - ( 2 - 4 ) 

The transformation formula for double integrals implies that &{M) is independent 
of the parametrization of M. 

Lemma 2.3. If the mapping g : U —> M 2d is symplectic on U, then it preserves the 
expression i.e., 

n(g(M)) = Q{M) 

holds for all 2-dimensional manifolds M that can be represented as the image of a 
continuously differentiable function 

Proof The manifold g(M) can be parametrized by g o f>. We have 

fl(g(M)) = JJx w t), ^ ^ (s, tPj dsdt = f2(M), 

because (g o ip)'(s, t ) = g'(ip(s, t))ip'(s, t ) and g is a symplectic transformation. □ 

For d = 1, M is already a subset of M 2 and we choose K = M with ip the 
identity map. In this case, £2{M) = ff M dsdt represents the area of M. Hence, 
Lemma 2.3 states that all symplectic mappings (also nonlinear ones) are area pre¬ 
serving. 

We are now able to prove the main result of this section. We use the notation 
V = (p, g), and we write the Hamiltonian system (1.7) in the form 

V = (2.5) 

where J is the matrix of (2.3) and VH(y) = H'(y) T . 

Recall that the flow ip t : U —> R 2d of a Hamiltonian system is the mapping that 
advances the solution by time t , i.e., <pt(po, Qo) = (p(t,Po> Qo),q(t,po, go)), where 
p(t,po, go), q(t,po, go) is the solution of the system corresponding to initial values 
p(0) =Po,g(0) = g 0 . 

Theorem 2.4 (Poincare 1899). Let H (p, g) be a twice continuously differentiable 
function onU C M? d . Then, for each fixed t, the flow ip t is a symplectic transforma¬ 
tion wherever it is defined. 

Proof. The derivative dpt/dyo (with y 0 = (p 0 ,go)) is a solution of the vari¬ 
ational equation which, for the Hamiltonian system (2.5), is of the form = 
J -1 V 2 i7 (<pt(p 0 ))^> where V 2 i7(p, g) is the Hessian matrix of H(p, q) (V 2 H(p , g) 
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Fig. 2.2. Area preservation of the flow of Hamiltonian systems 


is symmetric). We therefore obtain 


~((p) 

= ( 

-p) T j 


&£t\ 

t j(±P) 

dt\\ dy 0 > 

\dy 0 J) V 

dt dy 0 J 

\dy 0 J V 

dyo ) 

V dt dyo J 

(&£t' 

\dy 0 

) T V 2 H (<p t t (y 0 ))j 

-t j ( td'-Pt ' 
\dy 0 ' 


V 2 H 


because J T 

= —J and J~ T J 

= —I. Since the relation 




( 






\ 

,dy 0 J V 

dyo ’ 




is satisfied for t = 0 (po is the identity map), it is satisfied for all t and all (po, Qo), 
as long as the solution remains in the domain of definition of H. □ 

Example 2.5. We illustrate this theorem with the pendulum problem (Example 1.2) 
using the normalization m = £ = g = 1 . We have q = a, p = a, and the Hamil¬ 
tonian is given by 

iT(p, q) = p 2 / 2 — cos q. 

Fig. 2.2 shows level curves of this function, and it also illustrates the area preser¬ 
vation of the flow (ft- Indeed, by Theorem 2.4 and Lemma 2.3, the areas of A and 
Pt{A) as well as those of B and tft(B) are the same, although their appearance is 
completely different. 

We next show that symplecticity of the flow is a characteristic property for 
Hamiltonian systems. We call a differential equation y = f(y) locally Hamiltonian , 
if for every yo e U there exists a neighbourhood where f(y) = J~ 1 \7H(y) for 
some function H. 

Theorem 2.6. Let f : U —> M 2d be continuously differentiable. Then, y = f(y) is 
locally Hamiltonian if and only if its flow pt(y) is symplectic for all y E U and for 
all sufficiently small t. 
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Proof. The necessity follows from Theorem 2.4. We therefore assume that the flow 
(ft is symplectic, and we have to prove the local existence of a function H(y) such 
that f(y ) = J~ 1 'VH(y). Differentiating (2.6) and using the fact that dp t /dyo is a 
solution of the variational equation & = f' (<Pt(y o))$\ we obtain 


d 

dt 



= 0 . 


Putting t = 0, it follows from J = —J T that Jf'(yo) is a symmetric matrix for 
all 2/o- The Integrability Lemma 2.7 below shows that Jf(y ) can be written as the 
gradient of a function H(y). □ 


The following integrability condition for the existence of a potential was already 
known to Euler and Lagrange (see e.g., Euler’s Opera Omnia , vol. 19. p.2-3, or 
Lagrange (1760), p. 375). 


Lemma 2.7 (Integrability Lemma). Let D c W l be open and f : D —► M n be 
continuously differentiable, and assume that the Jacobian f'(y) is symmetric for all 
y G D. Then, for every yo G D there exists a neighbourhood and a function H(y) 
such that 

f(y ) = ViJ(y) (2.7) 

on this neighbourhood. In other words, the differential form fi (y) dyi + ... + 
f n (y) dy n = dH is a total differential. 

Proof. Assume yo = 0, and consider a ball around yo which is contained in D. On 
this ball we define ^ 

H{y)= [ y T f{ty)dt+ Const. 

Jo 

Differentiation with respect to y^, and using the symmetry assumption dfi/dyk = 
dfk/dyi yields 

wj y) = l ( Mty) + yT wC v)t ') dt = l 5(v*to))<*-«»)' 

which proves the statement. □ 

For D = M. 2d or for star-shaped regions D , the above proof shows that the func¬ 
tion H of Lemma 2.7 is globally defined. Hence the Hamiltonian of Theorem 2.6 
is also globally defined in this case. This remains valid for simply connected sets 
D. A counter-example, which shows that the existence of a global Hamiltonian in 
Theorem 2.6 is not true for general D , is given in Exercise 6. 

An important property of symplectic transformations, which goes back to Jacobi 
(1836, “Theorem X”), is that they preserve the Hamiltonian character of the differ¬ 
ential equation. Such transformations have been termed canonical since the 19th 
century. The next theorem shows that canonical and symplectic transformations are 
the same. 
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Theorem 2.8. Let f : U —► V be a change of coordinates such that f and f~ x 
are continuously differentiable functions. Iff) is symplectic, the Hamiltonian system 
y = J~ l VH(y) becomes in the new variables z = f(y) 

z = J~ 1 WK{z) with K(z ) = H(y). ( 2 . 8 ) 

Conversely, iff transforms every Hamiltonian system to another Hamiltonian sys¬ 
tem via (2.8), then f is symplectic. 

Proof. Since i = f'(y)y and f'(y) T 'VK(z) = VH(y), the Hamiltonian system 
y = J~ 1 \7H(y) becomes 


i = 4>\y)J~ 1 4>\y) T ^K(z) (2.9) 

in the new variables. It is equivalent to (2.8) if 

i> , (y)J~ 1 ii'(y) T = J~ l . (2.10) 

Multiplying this relation from the right by f'(y)~ T and from the left by ip'(y) 1 
and then taking its inverse yields J = tp'(y) T Jtp'(y), which shows that (2.10) is 
equivalent to the symplecticity of f. 

For the inverse relation we note that (2.9) is Hamiltonian for all K(z) if and 
only if (2.10) holds. □ 


VI.3 First Examples of Symplectic Integrators 


Since symplecticity is a characteristic prop¬ 
erty of Hamiltonian systems (Theorem 2.6), 
it is natural to search for numerical methods 
that share this property. Pioneering work on 
symplectic integration is due to de Vogelaere 
(1956), Ruth (1983), and Feng Kang (1985). 
Books on the now well-developed subject are 
Sanz-Serna & Calvo (1994) and Leimkuhler 
& Reich (2004). 

Definition 3.1. A numerical one-step method 
is called symplectic if the one-step map 

2/1 = ®h{ 2 / 0 ) 

is symplectic whenever the method is applied 
to a smooth Hamiltonian system. 



Feng Kang 2 


2 Feng Kang, born: 9 September 1920 in Nanjing (China), died: 17 August 1993 in Beijing; 
picture obtained from Yuming Shi with the help of Yifa Tang. 
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Fig. 3.1. Area preservation of numerical methods for the pendulum; same initial sets as in 
Fig. 2.2; first order methods (left column): h = 7t/4; second order methods (right column): 
h = 7r/3; dashed: exact flow 


Example 3.2. We consider the pendulum problem of Example 2.5 with the same 
initial sets as in Fig. 2.2. We apply six different numerical methods to this problem: 
the explicit Euler method (1.1.5), the symplectic Euler method (1.1.9), and the im¬ 
plicit Euler method (1.1.6), as well as the second order method of Runge (II. 1.3) 
(the right one), the Stormer-Verlet scheme (1.1.17), and the implicit midpoint rule 
(1.1.7). For two sets of initial values (po, Qo) we compute several steps with step size 
h = 7r/4 for the first order methods, and h = 7r/3 for the second order methods. 
One clearly observes in Fig. 3.1 that the explicit Euler, the implicit Euler and the 
second order explicit method of Runge are not symplectic (not area preserving). We 
shall prove below that the other methods are symplectic. A different proof of their 
symplecticity (using generating functions) will be given in Sect. VI.5. 

In the following we show the symplecticity of various numerical methods from 
Chapters I and II when they are applied to the Hamiltonian system in the vari- 
ables y = (p, q), 

p=-Hg(p,q) . i 

or equivalently y = J V7/(p), 

q= H p (p, q) 

where H p and H q denote the column vectors of partial derivatives of the Hamil¬ 
tonian H (p, q) with respect to p and q , respectively. 
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Theorem 3.3 (de Vogelaere 1956). The so-called symplectic Euler methods (1.1.9) 


Pn+l —Pn hHq (Pn+l, q n ) 

q n +1 = q n + hHp(p n+1 ,q n ) 
are symplectic methods of order 1. 


Pn-\-1 —Pn hH-qiPm Qn+1 ) 

q n + 1 = Qn + hH p (p n , q n +i) 


Proof. We consider only the method to the left of (3.1). Differentiation with respect 
to ( PmQn ) yields 


(I + hH,l 0\ f djpn+I,g n +i) \ = (I —hH qq \ 

V — hH pp IJ\ d(p n ,q n ) ) V° I + hHqp) 

where the matrices H qp , H pp ,... of partial derivatives are all evaluated at (p n +i, q n )- 
This relation allows us to compute and to check in a straightforward 

way the symplecticity condition ( d ^p 1,q q n ^ 1 ) T J () = J. □ 

The methods (3.1) are implicit for general Hamiltonian systems. For separable 
H(p, q) = T(p) + U(q), however, both variants turn out to be explicit. It is inter¬ 
esting to mention that there are more general situations where the symplectic Euler 
methods are explicit. If, for a suitable ordering of the components, 


dH ( \ 

(p, q) 

dqi 


does not depend on p j for j > f 


(3.2) 


then the left method of (3.1) is explicit, and the components of p n +i can be com¬ 
puted one after the other. If, for a possibly different ordering of the components, 


d H 

dpi 


(p, q) 


does not depend on qj for j > i. 


(3.3) 


then the right method of (3.1) is explicit. As an example consider the Hamiltonian 

H{p r ,P(p,r, ip) = ^[p 2 r + r ~ 2 pl) - rcos<fi+ (r - l) 2 , 


which models a spring pendulum in polar coordinates. For the ordering P < r, 
condition (3.2) is fulfilled, and for the inverse ordering r < p condition (3.3). Con¬ 
sequently, both symplectic Euler methods are explicit for this problem. The methods 
remain explicit if the conditions (3.2) and (3.3) hold for blocks of components in¬ 
stead of single components. 

We consider next the extension of the Stormer-Verlet scheme (1.1.17), consid¬ 
ered in Table II.2.1. 


Theorem 3.4. The Stormer-Verlet schemes (1.1.17) 

Pn+1/2 = Pn — ~^H-q{j ) n+l/2') qn) 

q n + 1 = qn + 7j( K Hp(Pn+l/2,q n ) + ffp(Pn+l/ 2 , tfn+l)) (3-4) 
Pn+l = Pn+ 1/2 — 7^H-q(Pn+l/2i (Zn+l) 
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and 

Qn+ 1/2 = Qn + -H q (pm t/n+l/ 2 ) 

Pn+i = Pn - ^{H p (p n ,q n+1 / 2 ) + H p (p n+ i,q n+1/2 )^ (3.5) 

Qn+1 = Qn+1/2 + -^H q {p n +li # 71 + 1 / 2 ) 

are symplectic methods of order 2. 

Proof This is an immediate consequence of the fact that the Stormer-Verlet scheme 
is the composition of the two symplectic Euler methods (3.1). Order 2 follows from 
its symmetry. □ 

We note that the Stormer-Verlet methods (3.4) and (3.5) are explicit for separa¬ 
ble problems and for Hamiltonians that satisfy both conditions (3.2) and (3.3). 

Theorem 3.5. The implicit midpoint rule 

Vn+i =Vn + hJ~ 1 VH ((y n+ 1 + y n )/2) (3.6) 

is a symplectic method of order 2. 

Proof Differentiation of (3.6) yields 

(/-+'v 2 .ff) (+i) = (i + P~'v*h). 

Again it is straightforward to verify that ( %™ +1 ) T J( d Qy +1 ) = «/• Due to its sym¬ 
metry, the midpoint rule is known to be of order 2 (see Theorem II.3.2). □ 

The next two theorems are a consequence of the fact that the composition of 
symplectic transformations is again symplectic. They are also used to prove the 
existence of symplectic methods of arbitrarily high order, and to explain why the 
theory of composition methods of Chapters II and III is so important for geometric 
integration. 

Theorem 3.6. Let <T>h denote the symplectic Euler method (3.1). Then, the compo¬ 
sition method (II.4.6) is symplectic for every choice of the parameters cti, fy. 

If <T>h is symplectic and symmetric (e.g., the implicit midpoint rule or the 
Stormer-Verlet scheme), then the composition method (V.3.8) is symplectic too. □ 

Theorem 3.7. Assume that the Hamiltonian is given by H(y ) = Hi(y) + H 2 (y), 
and consider the splitting 


V = J-'VHiy) = J~ l VHi(y) + J-^H 2 {y). 


The splitting method (II.5.6) is then symplectic. 


□ 



VI.4 Symplectic Runge-Kutta Methods 


191 


VI.4 Symplectic Runge-Kutta Methods 

The systematic study of symplectic Runge-Kutta methods started around 1988, and 
a complete characterization has been found independently by Lasagni (1988) (using 
the approach of generating functions), and by Sanz-Serna (1988) and Suris (1988) 
(using the ideas of the classical papers of Burrage & Butcher (1979) and Crouzeix 
(1979) on algebraic stability). 


VI.4.1 Criterion of Symplecticity 


We follow the approach of Bochev & Scovel (1994), which is based on the following 
important lemma. 


Lemma 4.1. For Runge-Kutta methods and for partitioned Runge-Kutta methods 
the following diagram commutes: 

_ y = f(y), y( o) = yo 

4' = f(y)V, $(o ) = / 


y = f(y), y(o) = 2/0 


| method 

{Vn} 


| method 

{Uni ^n} 


(horizontal arrows mean a differentiation with respect to yo). Therefore , the numer¬ 
ical result y n ^n, obtained from applying the method to the problem augmented by 
its variational equation , is equal to the numerical solution for y = f(y) augmented 
by its derivative \F n = dy n /dyo- 


Proof The result is proved by implicit differentiation. Let us illustrate this for the 
explicit Euler method 

Vn -\-1 = Vn T~ hfiVn)- 

We consider y n and y n +i as functions of yo, and we differentiate with respect to yo 
the equation defining the numerical method. For the Euler method this gives 


dyn+i 

dyo 


dy n 

dyo 


hf'(y n ) 


dyn 

dyo ’ 


which is exactly the relation that we get from applying the method to the variational 
equation. Since dyo/dyo = 7, we have dy n /dyo = & n for all n. □ 


The main observation now is that the symplecticity condition (2.6) is a quadratic 
first integral of the variational equation: we write the Hamiltonian system together 
with its variational equation as 


y = J~ 1 VH(y), ¥ = J- 1 V 2 H(y)V. (4.1) 


It follows from 
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+ & T J( J~ 1 \7 2 H{y)^) = 0 

(see also the proof of Theorem 2.4) that is a quadratic first integral of the 

augmented system (4.1). 

Therefore, every Runge-Kutta method that preserves quadratic first integrals, is 
a symplectic method. From Theorem IV.2.1 and Theorem IV.2.2 we thus obtain the 
following results. 

Theorem 4.2. The Gauss collocation methods of Sect. II. 1.3 are symplectic. □ 
Theorem 4.3. If the coefficients of a Runge-Kutta method satisfy 


biaij + bjaji = bibj for all i,j = 1 ,s, (4.2) 

then it is symplectic. □ 

Similar to the situation in Theorem V.2.4, diagonally implicit, symplectic Runge- 
Kutta methods are composition methods. 

Theorem 4.4. A diagonally implicit Runge-Kutta method satisfying the symplec- 
ticity condition (4.2) and bi f 2 0 is equivalent to the composition 




o <P 


M 

b\hi 


where <T>ff stands for the implicit midpoint rule. 


Proof. For i = j condition (4.2) gives an = bi/2 and, together with aji = 0 (for 
i > j ), implies = bj. This proves the statement. □ 


The assumption 0” is not restrictive in the sense that for diagonally im¬ 

plicit Runge-Kutta methods satisfying (4.2) the internal stages corresponding to 
“bi = 0” do not influence the numerical result and can be removed. 

To understand the symplecticity of partitioned Runge-Kutta methods, we write 
the solution Tf of the variational equation as 



Then, the Hamiltonian system together with its variational equation (4.1) is a parti¬ 
tioned system with variables (p, T p ) and (g, T rq ). Every component of 

\p T Jip = ( \j/P) T \j/ q - (\p q ) T \l/P 

is of the form (IV.2.5), so that Theorem IV.2.3 and Theorem IV.2.4 yield the fol¬ 
lowing results. 

Theorem 4.5. The Lobatto IIIA - IIIB pair is a symplectic method. □ 
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Theorem 4.6. If the coefficients of a partitioned Runge-Kutta method (II. 2.2) sat- 

is fy 

b(a,ij + bjdji = bibj for i, j = 1,..., s, (4.3) 

bi = bi for i = 1,..., s, (4.4) 


then it is symplectic. 

If the Hamiltonian is of the form H(p , q) = T(p) + U(q), i.e., it is separable, 
then the condition (4.3) alone implies the symplecticity of the numerical flow. □ 

We have seen in Sect. V.2.2 that within the class of partitioned Runge-Kutta 
methods it is possible to get explicit, symmetric methods for separable systems y = 
f(z),z = g(y). A similar result holds for symplectic methods. However, as in 
Theorem V.2.6, such methods are not more general than composition or splitting 
methods as considered in Sect. II.5. This has first been observed by Okunbor & 
Skeel (1992). 


Theorem 4.7. Consider a partitioned Runge-Kutta method based on two diago¬ 
nally implicit methods (i.e., aji = dji = 0 for i > j), assume an - da = 0 for all 
i, and apply it to a separable Hamiltonian system with H(p , q) = T(p) + U ( q ). If 
(4.3) holds, then the numerical result is the same as that obtained from the splitting 
method (II.5.6). 

By (II.5.8), such a method is equivalent to a composition of symplectic Euler 
steps. 


Proof. We first notice that the stage values ki = f(Zf) (for i with bi = 0) and 
li = g(Yi) (for i with bi = 0 ) do not influence the numerical solution and can be 
removed. This yields a scheme with non-zero bi and bi , but with possibly non-square 
matrices (a^) and (dij). 

Since the method is explicit for separable problems, one of the reduced matrices 
(aij) or (< dij ) has a row consisting only of zeros. Assume that it is the first row of 
(aij), so that a\j = 0 for all j. The symplecticity condition thus implies an = b\ 

0 for all i > 1, and an = b\ 0 for i > 2. This then yields S 22 0, because 

otherwise the first two stages of (dij ) would be identical and one could be removed. 
By our assumption we get a ^2 = 0, a i2 = b 2 0 for i > 2, and < 2^2 = b 2 for i > 3. 
Continuing this procedure we see that the method becomes 


[ 2 ] [ 1 ] [ 2 ] [ 1 ] 


where and are the exact flows corresponding to the Hamiltonians T(p) and 
V (q), respectively. □ 


The necessity of the conditions of Theorem 4.3 and Theorem 4.6 for symplectic 
(partitioned) Runge-Kutta methods will be discussed at the end of this chapter in 


Sect. VI.7.4. 
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A second order differential equation y = g(y), augmented by its variational 
equation, is again of this special form. Furthermore, the diagram of Lemma 4.1 
commutes for Nystrom methods, so that Theorem IV.2.5 yields the following result 
originally obtained by Suris (1988, 1989). 

Theorem 4.8. If the coefficients of a Ny strom method (IV. 2.11) satisfy 


pi 

hi (Pj ~ a ij ) 


bi{ 1 -Ci) for i = l,...,s, 
bj(Pi - aji ) for i,j = 1 


(4.5) 


then it is symplectic. 


□ 


VI.4.2 Connection Between Symplectic and Symmetric Methods 

There exist symmetric methods that are not symplectic, and there exist symplectic 
methods that are not symmetric. For example, the trapezoidal rule 

yi = yo + ^ (/(yo) + f(y 1 )) (4.6) 

is symmetric, but it does not satisfy the condition (4.2) for symplecticity. In fact, 
this is true of all Lobatto IIIA methods (see Example II.2.2). On the other hand, any 
composition d> 7l h o (71+72 = 1) of symplectic methods is symplectic but 
symmetric only if 71 =72. 

However, for (non-partitioned) Runge-Kutta methods and for quadratic Hamil¬ 
tonians H(y) = \y T Cy (C is a symmetric real matrix), where the corresponding 
system (2.5) is linear, 

V = J~ l Cy, (4.7) 

we shall see that both concepts are equivalent. 

A Runge-Kutta method, applied with step size h to a linear system y = Ly, is 
equivalent to 

Vi = R(hL)y 0 , (4.8) 

where the rational function R(z) is given by 

R(z) = l + zb r (I-zA)~ 1 l, (4.9) 

A = (aij), b T = (b \,..., b s ), and 1 T = (1,..., 1). The function R(z) is called 

the stability function of the method, and it is familiar to us from the study of stiff 
differential equations (see e.g., Hairer & Wanner (1996), Chap. IV.3). 

For the explicit Euler method, the implicit Euler method and the implicit mid¬ 
point rule, the stability function R{z) is given by 

1 l + z/2 

1 — ' 


1 ~h z, 


1 -z’ 
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Theorem 4.9. For Runge-Kutta methods the following statements are equivalent: 

• the method is symmetric for linear problems y = Ly; 

• the method is symplectic for problems (4.7) with symmetric C; 

• the stability function satisfies R(—z)R(z) = 1 for all complex z. 

Proof The method y\ = R(hL)yo is symmetric, if and only if y Q = R(—hL)yi 
holds for all initial values po- But this is equivalent to R(—hL)R(hL ) = I. 

Since @ f h (yo) = R(hL ), symplecticity of the method for the problem (4.7) is de¬ 
fined by P(/iJ _1 C) T JP(/iJ _1 C) = J. For R(z) = P(z)/Q(z) this is equivalent 
to 

P(hJ~ 1 C) T JP(hJ~ 1 C) = Q(hJ~ 1 C) T JQihJ-'C). (4.10) 

By the symmetry of C, the matrix L := J~ X C satisfies L T J = —JL and hence 
also (L k ) T J = J{—L) k for k = 0,1,2,.... Consequently, (4.10) is equivalent to 

P(-hJ~ l C)P(hJ^C) = Q(—hJ~ 1 C)Q(hJ~ 1 C), 

which is nothing other than R(—hJ~ 1 C)R(hJ~ 1 C) = I. □ 


VI.5 Generating Functions 

... by which the study of the motions of all free systems of attracting or 
repelling points is reduced to the search and differentiation of one central 
relation, or characteristic function. (W.R. Hamilton 1834) 

Professor Hamilton hat... das merkwiirdige Resultat gefunden, dass ... 
sich die Integralgleichungen der Bewegung ... sammtlich durch die par- 
tiellen Differentialquotienten einer einzigen Function darstellen lassen. 

(C.G.J. Jacobi 1837) 

We enter here the second heaven of Hamiltonian theory, the realm of partial dif¬ 
ferential equations and generating functions. The starting point of this theory was 
the discovery of Hamilton that the motion of the system is completely described 
by a “characteristic” function S, and that S is the solution of a partial differential 
equation, now called the Hamilton-Jacobi differential equation. 

It was noticed later, especially by Siegel (see Siegel & Moser 1971, §3), that 
such a function S is directly connected to any symplectic map. It received the name 
generating function. 

VI.5.1 Existence of Generating Functions 

We now consider a fixed Hamiltonian system and a fixed time interval and denote 
by the column vectors p and q the initial values pi ,..., pd and q %,..., at to of a 
trajectory. The final values at t\ are written as P and Q. We thus have a mapping 
(p, q) i—» (P, Q ) which, as we know, is symplectic on an open set U. 

The following results are conveniently formulated in the notation of differential 
forms. For a function F we denote by dF = F' its (Frechet) derivative. We denote 
by dq = (dqi ,..., dqd) T the derivative of the coordinate projection (p, q ) q. 
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Theorem 5.1. A mapping <p : (p, q) i—» (P, Q ) A symplectic if and only if there 
exists locally a function S(p , g) such that 

P T dQ — p T dq = dS. (5.1) 

This means that P T dQ — p T dq is a total differential. 

Proof We split the Jacobian of cp into the natural 2x2 block matrix 

djro) = (p p p q \ 

d(p^q) \Qp Qq) 

Inserting this into (2.6) and multiplying out shows that the three conditions 

PpQp=Q T V Pp, PpQq-I = Q T pPq , Q T q Pq = P^Qq ( 5 - 2 ) 

are equivalent to symplecticity. We now insert dQ = Q v dp + Q q dq into the left- 
hand side of (5.1) and obtain 


(■ p t Q p , P T Q q -P T ) 



( Qp p 
\QqP~P 


T 



To apply the Integrability Lemma 2.7, we just have to verify the symmetry of the 
Jacobian of the coefficient vector, 


( Qp p p Qp p g\ | y p &Qi 

\QqPp~ I Q T qPq) +Z ^ l d(p,qr 


(5.3) 


Since the Hessians of Qi are symmetric anyway, it is immediately clear that the 
symmetry of the matrix (5.3) is equivalent to the symplecticity conditions (5.2). □ 


Reconstruction of the Symplectic Map from S. Up to now we have considered 
all functions as depending on p and q. The essential idea now is to introduce new 
coordinates; namely (5.1) suggests using z = (g, Q) instead of y = (p, q). This is a 
well-defined local change of coordinates y = f>(z) if p can be expressed in terms of 
the coordinates (g, Q), which is possible by the implicit function theorem if ^ is 
invertible. Abusing our notation we again write S(q 1 Q) for the transformed function 
Then, by comparing the coefficients of dS = 9 S( q^ dq + 8S q ( ^ dQ 
with (5.1), we arrive at 3 

P =^M), P=~ Tq M)- ( 5 - 4 ) 


If the transformation (p, g) (P, Q ) is symplectic, then it can be reconstructed 
from the scalar function 5(g, Q) by the relations (5.4). By Theorem 5.1 the converse 

3 On the right-hand side we should have put the gradient VqS = (dS/dQ ) T . We shall 
not make this distinction between row and column vectors when there is no danger of 
confusion. 
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is also true: any sufficiently smooth and nondegenerate function S(q,Q ) “gener¬ 
ates” via (5.4) a symplectic mapping (p, q) (P, Q). This gives us a powerful tool 
for creating symplectic methods. 

Mixed-Variable Generating Functions. Another often useful choice of coordi¬ 
nates for generating symplectic maps are the mixed variables (P, q). For any con¬ 
tinuously differentiable function S(P, q ) we clearly have dS = §fp dP + || dq. On 
the other hand, since d(P T Q ) = P T dQ + Q T dP , the symplecticity condition (5.1) 
can be rewritten as Q T dP +p T dq = d(Q T P — S ) for some function S. It therefore 
follows from Theorem 5.1 that the equations 

pj C pjQ 

Q=Op(rq), P=^(P,9) (5.5) 

define (locally) a symplectic map (p, q) i—> (P, Q) if d 2 S/dPdq is invertible. 

Example 5.2. Let Q = x{q) be a change of position coordinates. With the gener¬ 
ating function S(P, g) = P t x{q) we obtain via (5.5) an extension to a symplectic 
mapping (p, g) ^ (P, Q). The conjugate variables are thus related by p = x'(q) T P • 

Mappings Close to the Identity. We are mainly interested in the situation where 
the mapping (p, q) i—> (P, Q) is close to the identity. In this case, the choices (p, Q) 
or (P, q) or ((P + p)/2, (Q + g)/2) of independent variables are convenient and 
lead to the following characterizations. 

Lemma 5.3. Let (p, q) i—► (P, Q ) a smooth transformation, close to the identity. 
It is symplectic if and only if one of the following conditions holds locally: 

• Q T dP + p T dq = d(P T q + S' 1 ) /or some function S 1 (P, g); 

• P T dQ + q T dp = d(p T Q — S 2 ) /or some function S 2 (p , Q); 

• {Q - q) T d(P + p) - (P — p) T d(Q + g) = 2 dS 3 

/or some function S 3 ((P + p)/2, (Q + g)/2). 

Proof. The first characterization follows from the discussion before formula (5.5) if 
we put S 1 such that P T q + S 1 = S = Q T P — S. For the second characterization we 
use d(p T q) = p T dq + g T dp and the same arguments as before. The last one follows 
from the fact that (5.1) is equivalent to ( Q — q) T d(P + p) — (P — p) T d(Q + q) = 
d((P + p) T (Q-g)-2S). □ 

The generating functions S 1 , S 2 , and S 3 have been chosen such that we obtain 
the identity mapping when they are replaced with zero. Comparing the coefficient 
functions of dq and dP in the first characterization of Lemma 5.3, we obtain 

SS ' 1 SS ' 1 

p = P + Q = q + -Qp(P,q). 


(5.6) 
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Whatever the scalar function S 1 (P, q) is, the relation (5.6) defines a symplectic 
transformation (p, q) i—> (P, Q). For P 1 (P, g) := hH(P , g) we recognize the sym¬ 
plectic Euler method (1.1.9). This is an elegant proof of the symplecticity of this 
method. The second characterization leads to the adjoint of the symplectic Euler 
method. 

The third characterization of Lemma 5.3 can be written as 

P = p-d 2 S 3 ((P + p)/2,(Q + q)/2), 

Q = q + 3iP 3 ((P + p)/2, (Q + g)/2), 

which, for S 3 = hH , is nothing other than the implicit midpoint rule (1.1.7) applied 
to a Hamiltonian system. We have used the notation d\ and $2 for the derivative with 
respect to the first and second argument, respectively. The system (5.7) can also be 
written in compact form as 

Y = y + J- 1 VS 3 ((Y + y)/2), (5.8) 

where Y = (P, Q), y = (p, g), S 3 (w) = S 3 (u,v) with re = (iz, u), and J is the 
matrix of (2.3). 


VI.5.2 Generating Function for Symplectic Runge-Kutta 
Methods 

We have just seen that all symplectic transformations can be written in terms of gen¬ 
erating functions. What are these generating functions for symplectic Runge-Kutta 
methods? The following result, proved by Lasagni in an unpublished manuscript 
(with the same title as the note Lasagni (1988)), gives an alternative proof for The¬ 
orem 4.3. 

Theorem 5.4. Suppose that 

bidij + bjdji = bibj for all i,j (5.9) 

(see Theorem 4.3). Then, the Runge-Kutta method 

s 

P = p-hJ2biH q (Pi,Qi), 

i= 1 
s 

Q = q + h'Y]b i Hp(P i ,Qi), 
can be written as (5.6) with 

s 

S\P,q,h) = hY / b i H(P i ,Q i )-h 2 J2 bidij H q (Pi, Qi) T H p (Pj,Qj). (5.11) 

i= 1 i,j = l 


Pi p H ^ ^ CLij Hq ( Pj , Qj ), 

n 1 (5.10) 

Qi= q + h ^ ^ dij Hp ( Pj , Qj') 

3 = 1 
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Proof. We first differentiate S 1 (P, q , h) with respect to q. Using the abbreviations 

H[i\ = H(Pi, Qi ), H p [i\ = H p (Pi , Q*), ..., we obtain 


d 

dq 


(Ewth) 


E wr P [*] T (^ - ^E^' 

* i 

EW] T (/ + />E a ^^l)- 


With 




9q 


(this is obtained by differentiating the first relation of (5.10)), Leibniz’ rule 


A( Hq \i] T H p [j}) = Hq [i] T -^H p [j}+H p [j} T -^H q [i] 

and the condition (5.9) therefore yield the first relation of 


asqp,^) 

dq 


i 


OSHPqJi) 

dP 


hJ2biH p [i}. 

i 


The second relation is proved in the same way. This shows that the Runge-Kutta 
formulas (5.10) are equivalent to (5.6). □ 


It is interesting to note that, whereas Lemma 5.3 guarantees the local existence 
of a generating function S 1 , the explicit formula (5.11) shows that for Runge-Kutta 
methods this generating function is globally defined. This means that it is well- 
defined in the same region where the Hamiltonian H (p, q) is defined. 


Theorem 5.5. A partitioned Runge-Kutta method (77.2.2), satisfying the symplec- 
ticity conditions (4.3) and (4.4), is equivalent to (5.6) with 

s s 

S\P,q,h) = hy2biH(Pi,Qi)-h 2 E biOijH^PuQifH^Qi). 

1=1 i,j=l 

If the Hamiltonian is of the form H (p, q) = T(p) + 77(g), i.e., it is separable, 
then the condition (4.3) alone implies that the method is of the form (5.6) with 

s s 

S\P,q,h ) = hy2(biU(Qi) +biT(pf - ft 2 E biSijUgiQifTpiPj,). 

i= 1 i,j = 1 


Proof. This is a straightforward extension of the proof of the previous theorem. □ 
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VI.5.3 The Hamilton-Jacobi Partial Differential Equation 



C.G.J. Jacobi 


We now return to the above construction of 
S for a symplectic transformation (p, q) 
(P,Q) (see Theorem 5.1). This time, how¬ 
ever, we imagine the point P(t),Q(t) to 
move in the flow of the Hamiltonian system 
(1.7). We wish to determine a smooth gener¬ 
ating function S(q,Q,t), now also depending 
on t, which generates via (5.4) the symplectic 
map (p, q) i—* (P(£), Q{t)) of the exact flow 
of the Hamiltonian system. 

In accordance with equation (5.4) we 
have to satisfy 


P i(t) = 

ds, , 

Pi = -T^\q,Q{t),t). 


(5.12) 


Differentiating the second relation with respect to t yields 


02 o d o 

° = ^(«.0 ( <).*)+g5^T(9.<5(*).t)'4W (5-13) 

fl2 o d fX2 c p)Tj 

j= 1 


where we have inserted the second equation of (1.7) for Qj. Then, using the chain 
rule, this equation simplifies to 


d qi \ dt \dQ 1 


dS 

dQd 



= 0. 


(5.15) 


This motivates the following surprisingly simple relation. 


Theorem 5.6. If S(q, Q, t) is a smooth solution of the partial differential equation 


dS 

~dt 


+ H 


ds ds 

Wi'"'"Wd lQl 



(5.16) 


with initial values satisfying §jp(q,q, 0) + -§^~(q,q, 0) = 0, and if the matrix 
( dq dQ ) ls i nver tible, then the map ( p,q ) i—> (P(f), Q(t)j defined by (5.12) is 
the flow ! -f i (p, q) of the Hamiltonian system (1.7). 

Equation (5.16) is called the “Hamilton-Jacobipartial differential equation”. 


4 Carl Gustav Jacob Jacobi, bom: 10 December 1804 in Potsdam (near Berlin), died: 18 
February 1851 in Berlin. 
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Proof. The invertibility of the matrix ( q^qq ) and the implicit function theorem 

imply that the mapping (p, q) i—> (P(t),Q(£)) is well-defined by (5.12), and, by 
differentiation, that (5.13) is true as well. 

Since, by hypothesis, S(q,Q,t) is a solution of (5.16), the equations (5.15) 
and hence also (5.14) are satisfied. Subtracting (5.13) and (5.14), and once again 
using the invertibility of the matrix ) , we see that necessarily Q(t) = 

H p (P(£), This proves the validity of the second equation of the Hamiltonian 

system (1.7). 

The first equation of (1.7) is obtained as follows: differentiate the first relation 
of (5.12) with respect to t and the Hamilton-Jacobi equation (5.16) with respect 
to Qi , then eliminate the term S dt • Using Q(t) = H p {P{t), Q(t)), this leads in 

a straightforward way to P(t) = —H q (P(t),Q(t)). The condition on the initial 
values of S ensures that (P(0), Q(0)) = (p, q). □ 

In the hands of Jacobi (1842), this equation turned into a powerful tool for the 
analytic integration of many difficult problems. One has, in fact, to find a solution 
of (5.16) which contains sufficiently many parameters. This is often possible with 
the method of separation of variables. An example is presented in Exercise 11. 

Hamilton-Jacobi Equation for S 1 , S 2 , and S 3 . We now express the Hamilton- 
Jacobi differential equation in the coordinates used in Lemma 5.3. In these coordi¬ 
nates it is also possible to prescribe initial values for S at t = 0. 

From the proof of Lemma 5.3 we know that the generating functions in the 
variables (g, Q ) and (P, q) are related by 

S 1 {P,q,t) = P t (Q - q) - S(q,Q,t). (5.17) 


We consider P, g, t as independent variables, and we differentiate this relation with 
respect to t. Using the first relation of (5.12) this gives 


dS 1 
dt 


(P,q,t) = P T 


dQ 

dt 


dS 

dQ 


(q-.Q-t) 


dQ 

dt 


f) Q 

~di 


as 

dt 


(■ q,Q,t )• 


Differentiating (5.17) with respect to P yields 


dS 1 


,dQ dS 


dQ 


w (P,q,t) = Q- q + P^--( q ,Q,t)^=Q- q . 

Inserting = P and Q = q + into the Hamilton-Jacobi equation (5.16) we 
are led to the equation of the following theorem. 

Theorem 5.7. If S 1 (P, g, t) is a solution of the partial differential equation 


ftQl / ft cl x 

— (P,g,t) = ff(p,g+—(P,g,t)), S\P,q, 0) = 0, (5.18) 

then the mapping (p, g) (P(t), Q(t)), defined by (5.6), is the exact flow of the 
Hamiltonian system (1.7). 
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Proof. Whenever the mapping (p, q) i—» (P(t), Q(t)) can be written as (5.12) with 
a function S(q,Q,t), and when the invertibility assumption of Theorem 5.6 holds, 
the proof is done by the above calculations. Since our mapping, for t = 0, reduces 
to the identity and cannot be written as (5.12), we give a direct proof. 

Let 5' 1 (P, g, t) be given by the Hamilton-Jacobi equation (5.18), and assume 
that (p, q) i * (P, Q ) = (P(t), Q(t)) is the transformation given by (5.6). Differen¬ 
tiation of the first relation of (5.6) with respect to time t and using (5.18) yields 5 



d 2 S 1 
dPdq 


{P Q, t 


P 


d 2 S 1 
dtdq 


(P q, t) 




dH. 

dQ [ 


Differentiation of the second relation of (5.6) gives 

Q = 


r) 2 S' 1 f) 2 S' 1 

(P, q ,t) + — T (P,q,t)P 


dtdP 

= ^(p,Q) + 


dP 2 
d 2 S 1 




d p2 


DQ 


Consequently, P = ~^(P,Q) and Q = | p(P,Q), so that ( P(t),Q(t )) = 
(ft (p, q) is the exact flow of the Hamiltonian system. □ 

Writing the Hamilton-Jacobi differential equation in the variables (P + p)/2, 
(Q + q)/2 gives the following formula. 

Theorem 5.8. Assume that S 3 (u , v, t ) is a solution of 

ft Q3 x 1 £)Q3 1 8 q 3 \ 

— ( U ,v,t) = H{u-~ — (u,v,t),v + ~ — (u,v,t)j (5.19) 

with initial condition S 3 (u,v, 0) = 0. Then, the exact flow of the Hamil- 

tonian system (1.7) satisfies the system (5.7). 


Proof. As in the proof of Theorem 5.7, one considers the transformation (p, q) m 
(P(£), Q(t)) defined by (5.7), and then checks by differentiation that (P(£), Q(t)) 
is a solution of the Hamiltonian system (1.7). □ 


Writing w = (it, v) and using the matrix J of (2.3), the Hamilton-Jacobi equa¬ 
tion (5.19) can also be written as 

<9 S' 3 /I \ 

— (w,t) = H(w+-J- 1 VS 3 (w,t)), 5 3 (tu,0)=0. (5.20) 

The solution of (5.20) is anti-symmetric in t, i.e., 

S 3 (w, —t ) = — S 3 (w , t). (5.21) 

5 Due to an inconsistent notation of the partial derivatives as column or row vec¬ 

tors, this formula may be difficult to read. Use indices instead of matrices in order to check 
its correctness. 
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This can be seen as follows: let ipt(w) be the exact flow of the Hamiltonian system 
y = J -1 VH(y). Because of (5.8), S 3 (w , t) is defined by 

<Pt(w) - w = J _1 V S 3 +w)/2,t). 

Replacing t with —t and then w with (ft(w) we get from ip_ t (yp t (t)) = w that 

w - <p t (w) = J~ 1 j VS 3 ((w + <pt(w))/ 2, -t). 

Hence S 3 (w,t) and — S 3 (w , —t) are generating functions of the same symplectic 
transformation. Since generating functions are unique up to an additive constant 
(because dS = 0 implies S = Const), the anti-symmetry (5.21) follows from the 
initial condition S 3 (w , 0) = 0. 


VI.5.4 Methods Based on Generating Functions 

To construct symplectic numerical methods of high order, Feng Kang (1986), Feng 
Kang, Wu, Qin & Wang (1989) and Channell & Scovel (1990) proposed computing 
an approximate solution of the Hamilton-Jacobi equation. For this one inserts the 
ansatz 

S\P,q,t) = tG\ (P, q) + t 2 G 2 (P, q) + t 3 G 3 (P,q) + ... 
into (5.18), and compares like powers of t. This yields 

Gi(P,q) 

G 2 (P,q) 

Gs(P,q) 

If we use the truncated series 

S\P, q) = hGi(P, q) + h 2 G 2 (P, q) + ... + h r G r (P,q) (5.22) 

and insert it into (5.6), the transformation (p, q) ^ (P, Q) defines a symplectic one- 
step method of order r. Symplecticity follows at once from Lemma 5.3 and order r 
is a consequence of the fact that the truncation of S 1 (P, q ) introduces a perturbation 
of size 0{h r+1 ) in (5.18). We remark that for r > 2 the methods obtained require 
the computation of higher derivatives of H(p, q), and for separable Hamiltonians 
H(p,q) = Tip) + U(q) they are no longer explicit (compared to the symplectic 
Euler method (3.1)). 

The same approach applied to the third characterization of Lemma 5.3 yields 

S 3 (w , h) = hGi(w) + h 3 G 3 (w ) + ... + h 2r ~ 1 G 2 r -i{w), 


= H(P,q), 

l/dHdH\.„ , 

2\dP~dqr P,q ^ 

l(d 2 H/dHy 
6\dP^\~dq) 


+ d 2 H dH dH + d 2 H (OH ^ 


dPdq dP dq dq 2 


dH_\ 2 

\W) 


(■ P,Q )• 


where G\(w) = H(w), 
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G 3 (w) = ^V 2 H(w)( k J- 1 WH(w),J- 1 VH(w)), 

and further Gj(w ) can be obtained by comparing like powers of h in (5.20). In this 
way we get symplectic methods of order 2 r. Since S 3 (w , h) has an expansion in 
odd powers of h, the resulting method is symmetric. 

The Approach of Miesbach & Pesch. With the aim of avoiding higher derivatives 
of the Hamiltonian in the numerical method, Miesbach & Pesch (1992) propose 
considering generating functions of the form 

s 

S 3 (w, h) = h^2biH(w + hc i J- 1 \7H(w)^, (5.23) 

i=1 

and to determine the free parameters 6 ^, q in such a way that the function of (5.23) 
agrees with the solution of the Hamilton-Jacobi equation (5.20) up to a certain order. 
For b s +\-i = bi and c s+ i_* = — q this function satisfies S 3 (w , —h) = — S 3 (w , h), 
so that the resulting method is symmetric. A straightforward computation shows that 
it yields a method of order 4 if 

i>=i' 

i=1 i=l 

For 8 = 3, these equations are fulfilled for b\ =63 = 5/18, 62 P 4/9, c\ = —C 3 = 
\/T5/10, and C 2 = 0 . Since the function S 3 of (5.23) has to be inserted into (5.20), 
these methods still need second derivatives of the Hamiltonian. 


VI.6 Variational Integrators 

A third approach to symplectic integrators comes from using discretized versions 
of Hamilton’s principle, which determines the equations of motion from a varia¬ 
tional problem. This route has been taken by Suris (1990), MacKay (1992) and 
in a series of papers by Marsden and coauthors, see the review by Marsden & 
West (2001) and references therein. Basic theoretical properties were formulated 
by Maeda (1980,1982) and Veselov (1988,1991) in a non-numerical context. 

VI.6.1 Hamilton’s Principle 

Ours, according to Leibniz, is the best of all possible worlds, and the laws 
of nature can therefore be described in terms of extremal principles. 

(C.L. Siegel & J.K. Moser 1971, p. 1) 

Man scheint dies Princip friiher ... unbemerkt gelassen zu haben. 
Hamilton ist der erste, der von diesem Princip ausgegangen ist. 

(C.G.J. Jacobi 1842, p. 58) 
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Hamilton gave an improved mathematical formulation of a principle 
which was well established by the fundamental investigations of Euler 
and Lagrange; the integration process employed by him was likewise 
known to Lagrange. The name “Hamilton’s principle”, coined by Jacobi, 
was not adopted by the scientists of the last century. It came into use, 
however, through the textbooks of more recent date. 

(C. Lanczos 1949, p. 114) 


Lagrange’s equations of motion (1.4) can be viewed as the Euler-Lagrange equa¬ 
tions for the variational problem of extremizing the action integral 

S{q) = [ L(q(t),q(t))dt (6.1) 

Jto 

among all curves q(t) that connect two given points qo and qpt 


q{to) = qo , q(t i) = qi . 


( 6 . 2 ) 


In fact, assuming q(t) to be extremal and considering a variation q(t) + eSq(t) 
with the same end-points, i.e., with Sq(to) = Sq(ti) = 0, gives, using a partial 
integration, 



S(q + s8q) 

£ = 0 



d^dL^ 
dt dq 



which leads to (1.4). The principle that the motion extremizes the action integral is 
known as Hamilton’s principle. 

We now consider the action integral as a function of (qo, q i), for the solution 
q(t) of the Euler-Lagrange equations (1.4) with these boundary values (this exists 
uniquely locally at least if qo,qi are sufficiently close), 


S(qo,qi)=[ L(q(t),q(t))dt. 

Jto 


(6.3) 


The partial derivative of S with respect to qo is, again using partial integration, 


dS_ _ PfdL dq dL dq \ 
dqo Jto ' dq dqo + dq dq 0 ) 

dL dq C 1 (dL ddL\dq dL 

dqdq 0 t o + J to \dq dt dq) dq 0 * dq q ° ,q ° 

with qo = q(to), where the last equality follows from (1.4) and (6.2). In view of the 
definition (1.5) of the conjugate momenta, p = dL/dq , the last term is simply — p 0 . 
Computing dS/dqi = pi in the same way, we thus obtain for the differential of S 


dS _ dS _ 

dS = — dq x + — dqo = p i dq x - p 0 dq 0 

oqi oqo 


(6.4) 


which is the basic formula for symplecticity generating functions (see (5.1) above), 
obtained here by working with the Lagrangian formalism. 
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VI.6.2 Discretization of Hamilton’s Principle 

Discrete-time versions of Hamilton’s principle are of mathematical interest in their 
own right, see Maeda (1980,1982), Veselov (1991) and references therein. Here they 
are considered with the aim of deriving or understanding numerical approximation 
schemes. The discretized Hamilton principle consists of extremizing, for given g 0 
and qj\[, the sum 

N—l 

SMi„)X) = E Lh (Qti ■> Qn+1 ) • (6.5) 

n =0 

We think of the discrete Lagrangian L} h as an approximation 

rtn+1 

Lh(q n ,q n + 1 ) ~ / L(q(t),q(t))dt, (6.6) 

Jt„ 

where q(t) is the solution of the Euler-Lagrange equations (1.4) with boundary 
values q(t n ) = g n , q(t n + 1 ) = g n +i- If equality holds in (6.6), then it is clear 
from the continuous Hamilton principle that the exact solution values {q(t n )} of 
the Euler-Lagrange equations (1.4) extremize the action sum Sh- Before we turn 
to concrete examples of approximations Lh, we continue with the general theory 
which is analogous to the continuous case. 

The requirement dSh/dq n = 0 for an extremum yields the discrete Euler- 
Lagrange equations 


dL h 

dy 


{Qn—l 


Qn) + 


dU 

dx 


fc) Qn+l) — 0 


(6.7) 


for n = 1,..., N — 1, where the partial derivatives refer to Lh = Lh(x,y). This 
gives a three-term difference scheme for determining gi,..., qN-i- 
We now set 

N -1 

Sh(qo,VN) = ^ Lh(q n ,Qn+ 1 ) 

n =0 


where {q n } is a solution of the discrete Euler-Lagrange equations (6.7) with the 
boundary values go and g n. With (6.7) the partial derivatives reduce to 

dS h dL h ( , dS h dL h , , 

a % ~ ~ ( ' /v 1 -' /v) ' 


We introduce the discrete momenta via a discrete Legendre transformation, 


- dLh i \ 

Pn — \Qn i Qn-\-l) 

The above formula and (6.7) for n = N then yield 


( 6 . 8 ) 


dS h = p N dq N - p 0 dq 0 . 


( 6 . 9 ) 
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If (6.8) defines a bijection between p n and q n +i for given q n , then we obtain a 
one-step method <T>h • ( PmQn ) •—» (Pn+i,Qn+d) by composing the inverse dis¬ 
crete Legendre transform, a step with the discrete Euler-Lagrange equations, and 
the discrete Legendre transformation as shown in the diagram: 

(6.7) 

(Qn ■> Qn+l) * ((Zn+1 5 Qn+2 ) 


( 6 . 8 ) 




( 6 . 8 ) 


(Pni Qn ) 


{Pn+ 15 Qn+l) 


The method is symplectic by (6.9) and Theorem 5.1. A short-cut in the computation 
is obtained by noting that (6.7) and (6.8) (for n + 1 instead of n) imply 


Pn +1 — 


dL h 

dy 


fej Qn+ 1) 5 


( 6 . 10 ) 


which yields the scheme 


( 6 . 8 ) ( 6 . 10 ) 

(Pm Qn) * (Qm Qn+ 1) * {jPn+li Qn+l) • 


Let us summarize these considerations, which can be found in Maeda (1980), Suris 
(1990), Veselov (1991) and MacKay (1992). 


Theorem 6.1. The discrete Hamilton principle for (6.5) gives the discrete Euler- 
Lagrange equations (6.7) and the symplectic method 


__dLh ( x 

Pn — \Qm Qn+l) 


- d J±< \ 

Pn+1 q \Qn 5 Qn+l) • 


( 6 . 11 ) 


These formulas also show that Lh is a generating function (5.4) for the sym¬ 
plectic map (pni Qn) (Pn+hQn+i)- Conversely, since every symplectic method 
has a generating function (5.4), it can be interpreted as resulting from Hamilton’s 
principle with the generating function (5.4) as the discrete Lagrangian. The classes 
of symplectic integrators and variational integrators are therefore identical. 

We now turn to simple examples of variational integrators obtained by choosing 
a discrete Lagrangian Lh with (6.6). 


Example 6.2 (MacKay 1992). Choose Lh(q n ,q n +i) by approximating q(t) of 
(6.6) as the linear interpolant of q n and q n +i and approximating the integral by 
the trapezoidal rule. This gives 

T ( \ _ h T ( 4n+i - Qn\ , h r q n + 1 - q n \ „ 

\Qn i Qn+l) 2 ^ \ Q n ’ J + 2 ^ J (6.12) 

and hence the symplectic scheme, with t; n+1 / 2 = (Qn+i ~ Qn)/h for brevity, 
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Pn 


Pn+l 


IdL 

2~di 

IdL 
23 1 


(Qm ^n+ 1 / 2 ) + 

fc, ^n+l/ 2 ) + 


IdL 

2~di 

IdL 
2 ~d4 


(^n+i^n+ 1 / 2 ) “ 
(^n+i^n+ 1 / 2 ) + 


hdL 
2 ~dq 


(Qm ^n + 1 / 2 ) 


hdL 
2 ~dq 


(^n+l 5 ^n+l/ 2 ) • 


For a mechanical Lagrangian L(g, g) = ^q T Mq—U(q) this reduces to the Stormer- 
Verlet method 


^"^n+1/2 — Pn H - ^L n 

Qn +1 = Qn + hv n +i/2 

Pn +1 = % + i/ 2 + ^/lFn+1 

where F n = —VU(q n ). In this case, the discrete Euler-Lagrange equations (6.7) 
become the familiar second-difference formula M(g n+ 1 — 2g n + q n -i) = h 2 F n . 

Example 6.3 (Wendlandt & Marsden 1997). Approximating the integral in (6.6) 
instead by the midpoint rule gives 


7- / \ 7 T (Qn+l T" q n Qn+ 1 Qn\ //c i 

= hL\^ ---,---J. (6.13) 

This yields the symplectic scheme, with the abbreviations g n+1 / 2 = (g n +i + g n )/2 
and v n+1/2 = (q n + 1 - q n )/K 


dL hdL 

Pn = ^rtan+l/2j Vi/2) — 2 ^W«+ 1 / 2 ’V 1 / 2 ) 

dL hdL 

Pn +1 = ^-(^n+1/25 ^n+l/ 2 ) + 2 ^ Wn+l/2j^n+l/2) 


For L(g, g) = ^q T Mq — U(q) this becomes the implicit midpoint rule 


Mv n + 1/2 — Pn^~^hF n+ i/2 

Qn+l = Qn + hv n + 1/2 

Pn+l = Mv n + 1/2 + -jhF n + 1/2 


withF n+1/2 = -Vn(|(g n+ x + g„)). 


VI.6.3 Symplectic Partitioned Runge-Kutta Methods Revisited 

To obtain higher-order variational integrators, Marsden & West (2001) consider the 
discrete Lagrangian 


s 

L h (qo,Qi) = h’Y^b i L(u{cih),u( y c i h)) 

i= 1 


(6.14) 
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where u(t) is the polynomial of degree 5 with tx( 0 ) = go, u(h) = gi which ex- 
tremizes the right-hand side. They then show that the corresponding variational in¬ 
tegrator can be realized as a partitioned Runge-Kutta method. We here consider the 
slightly more general case 

s 

L h (q 0 ,qi) = h'Y^b i L(Q i ,Q i ) (6.15) 

i =1 


where 

Qi go H - h ^ ^ &ijQj 

3 = 1 

and the Qi are chosen to extremize the above sum under the constraint 

s 

qi = qo + h biQi. 

i =1 


We assume that all the bi are non-zero and that their sum equals 1. Note that (6.14) 
is the special case of (6.15) where the and bi are integrals (II. 1.10) of Lagrange 
polynomials as for collocation methods. 

With a Lagrange multiplier A = (Ai,..., A^) for the constraint, the extremality 
conditions obtained by differentiating (6.15) with respect to Qj for j = 1 , ..., s , 
read 

^ ^ bi ( Qi i Qi)bj(iij H- bj ( Qj , Qj ) — bj A . 

i= 1 

With the notation 

dL . dL 

Pi = ~7^(Qi,Qi) , Pi = ~gv(Qi,Qi) (6.16) 

this simplifies to 

bjPj = bj\ — h bidijPi . (6.17) 

i= 1 

The symplectic method of Theorem 6.1 now becomes 


Po 


dL h 

dx 


(9o,9i) 


i =1 J = 1 






bjPj 


dQj 

dqo 


—h biPi + A . 

i=l 


In the last equality we use (6.17) and h bjdQj/dqo = —7, which follows from 
differentiating the constraint. In the same way we obtain 
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dLh i \ \ 

Pi = -g^-{Qo,qi) = A . 

Putting these formulas together, we see that (pi,qi) result from applying a parti- 
tioned Runge-Kutta method to the Lagrange equations (1.4) written as a differential- 
algebraic system 

p=^-(q,q), p=^r(q,4)- ( 6 . 1 8) 


dq 


That is 


dq 


Pi = Po + h Ti , 

i= 1 

qi = qo + hJ2Ui b iQi i 

s 

Pi = po + hy2®ijPj , 

Qz qo P ^ y ]j= i 


(6.19) 


3 =1 


with aij = bj — bjdji/bi so that the symplecticity condition (4.3) is fulfilled, and 
with Qi, Pi, Qi related by (6.16). Since equations (6.16) are of the same form as 
(6.18), the proof of Theorem 1.3 shows that they are equivalent to 

BTT dTf 

Pi = -^(Pi,Qi), Qi = -^(Pi,Qi) (6.20) 

with the Hamiltonian H = p T q — L(q , q) of (1.6). We have thus proved the follow¬ 
ing, which is similar in spirit to a result of Suris (1990). 


Theorem 6.4. The variational integrator with the discrete Lagrangian (6.15) is 
equivalent to the symplectic partitioned Runge-Kutta method (6.19), (6.20) applied 
to the Hamiltonian system with the Hamiltonian (1.6). □ 


In particular, as noted by Marsden & West (2001), choosing Gaussian quadrature 
in (6.14) gives the Gauss collocation method applied to the Hamiltonian system, 
while Lobatto quadrature gives the Lobatto IIIA - IIIB pair. 


VI.6.4 Noether’s Theorem 

... enthalt Satz I alle in Mechanik u.s.w. bekannten Satze fiber erste In¬ 
tegrate. (E. Noether 1918) 

We now return to the subject of Chap. IV, i.e., the existence of first integrals, but 
here in the context of Hamiltonian systems. E. Noether found the surprising result 
that continuous symmetries in the Lagrangian lead to such first integrals. We give in 
the following a version of her “Satz I”, specialized to our needs, with a particularly 
short proof. 

Theorem 6.5 (Noether 1918). Consider a system with Hamiltonian H (p, q) and 
Lagrangian L(q,q). Suppose {g s : s G M} is a one-parameter group of transfor¬ 
mations (g s o g r = g s + r ) which leaves the Lagrangian invariant: 
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L (9s(<l),9s(q)q) = L(q,q) for all s and all (q,q). (6.21) 

Let a(q ) = (d/ds) | s= o g s (o) be defined as the vector field with flow g s (q). Then 

I(p,q) =p T a(q) (6.22) 

is a first integral of the Hamiltonian system. 


Example 6.6. Let G be a matrix Lie group with Lie algebra Q (see Sect. IV.6). Sup¬ 
pose L(Qq , Qq) = L(q, q) for all Q G G. Then p T Aq is a first integral for every 
A e Q. (Take g s (q) = exp(«sA)g.) For example, G = SO(n ) yields conservation of 
angular momentum. 

We prove Theorem 6.5 by using the discrete analogue, which reads as follows. 

Theorem 6.7. Suppose the one-parameter group of transformations {g s : s G M} 
leaves the discrete Lagrangian Lh(qo, q\) invariant: 


Lh(9s(qo),9s(qi)) = L h (q 0 ,qi) for all s and all (q 0 ,qi). (6.23) 

Then (6.22) is a first integral of the method (6.11), i.e., Vn+\ a (q n +\ ) = p„a(q n ). 
Proof Differentiating (6.23) with respect to s gives 




Lh{g s {qo),9s{qi)) a 

s=0 OX 


dLh ( qo,qi)a(q 0 ) + ^7fi(qo,9i)a(qi). 


dy 


By (6.11) this becomes 0 = —p^u^qf) -\-pJa(qi). 


□ 


Theorem 6.5 now follows by choosing L^ = 5 of (6.3) and noting (6.4) and 

S(q(to),q(h)) = / L(q(t),q(t))dt 

Jt 0 

= L{g s (q(t)),^g s (q(t))fit = S(g s (q(t 0 )),g s (q(ti))]. 

Theorem 6.7 has the appearance of giving a rich source of first integrals for sym- 
plectic methods. However, it must be noted that, unlike the case of the exact flow 
map in the above formula, the invariance (6.21) of the Lagrangian L does not in 
general imply the invariance (6.23) of the discrete Lagrangian Lf h of the numerical 
method. A noteworthy exception arises for linear transformations g s as in Exam¬ 
ple 6.6, for which Theorem 6.7 yields the conservation of quadratic first integrals 
p T Aq , such as angular momentum, by symplectic partitioned Runge-Kutta methods 
- a property we already know from Theorem IV.2.4. For Hamiltonian systems with 
an associated Lagrangian L(q, q) = \q T Mq — (7(g), all first integrals originating 
from Noether’s Theorem are quadratic (see Exercise 13). 
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VI.7 Characterization of Symplectic Methods 

Up to now in this chapter, we have presented sufficient conditions for the symplec- 
ticity of numerical integrators (usually in terms of certain coefficients). Here, we 
will prove necessary conditions for symplecticity, i.e., answer the question as to 
which methods are not symplectic. It will turn out that the sufficient conditions of 
Sect. VI.4, under an irreducibility condition on the method, are also necessary. The 
main tool is the Taylor series expansion of the numerical flow yo i—> <&h{yo), which 
we assume to be a B-series (or a P-series). 

VI.7.1 B-Series Methods Conserving Quadratic First Integrals 

The numerical solution of a Runge-Kutta method (II. 1.4) can be written as a 
B-series 

y\ T \ 

2/1 = B(a,y 0 ) = Vo + V a(r) F(r)(yo) (7.1) 

with coefficients a(r) given by 

a (. T ) = '52 b igi( T ) for reT (7.2) 

i= 1 

(see (III. 1.16) and Sect. III. 1.2). Our aim is to express the sufficient condition for 
the exact conservation of quadratic first integrals (which is the same as for symplec¬ 
ticity) in terms of the coefficients a(r). For this we multiply (4.2) by g i(u) • gj(v) 
(where u = [u\ ,..., u m \ and v = [v\ ,..., u/] are trees in T) and we sum over all i 
and j. Using (III. 1.13) and the recursion (III. 1.15) this yields 

s s s s 

y2 b iSi(uo V )+ j2 b jSj(vo U ) = (y2 b igi( u ))(J2 b jgj( v ))’ 

i=1 j=1 i= 1 j=X 

where we have used the Butcher product (see, e.g., Butcher (1987), Sect. 143) 

uov U m ,v\, VO u = [vi,...,vi,u] (7.3) 

(compare also Definition III.3.7 and Fig. 7.1 below). Because of (7.2), this implies 

a(u o v) + a(v o u) = a(u ) • a(v ) for u,v G T. (7.4) 

We now forget that the B-series (7.1) has been obtained from a Runge-Kutta 
method, and we ask the following question: is the condition (7.4) sufficient for a 
B-series method defined by (7.1) to conserve exactly quadratic first integrals (and 
to be symplectic)? The next theorem shows that this is indeed true, and we shall see 
later that condition (7.4) is also necessary (cf. Chartier, Faou & Murua 2005). 
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Theorem 7.1. Consider a B-series method <&h{y) = B(a,y) and problems 

V = f(y) having Q(y) = y T Cy (with symmetric matrix C) as first integral. 

If the coefficients a(r) satisfy (7.4), then the method exactly conserves Q(y) and 
it is symplectic. 

Proof, a) Under the assumptions of the theorem we shall prove in part (c) that 

_ _ __ />f#N _ 

B(a, y) T CB(a, y) = y T Cy + V - m(u,v) F(u){y) T CF(v){y) (7.5) 

u(veT a ( u ) cr ( v ) 

with m(u, v ) = a(u) • a(v) — a(u o v) — a(v o u). Condition (7.4) is equivalent to 
m(u, v) = 0 and thus implies the exact conservation of Q(y) = y T Cy. 

To prove symplecticity of the method it is sufficient to show that the diagram of 
Lemma 4.1 commutes for general B-series methods. This is seen by differentiating 
the elementary differentials and by comparing them with those for the augmented 
system (Exercise 8 ). Symplecticity of the method thus follows as in Sect. VIA 1 
form the fact that the symplecticity relation is a quadratic first integral of the aug¬ 
mented system. 

b) Since Q(y) = y T Cy is a first integral of y = f(y ), we have y T Cf(y ) = 0 
for all y. Differentiating m times this relation with respect to y yields 

m 

^ kj C/ (m_1) (y) (k \,..., kj_ i, k j+1 ,..,k m )+ y T Cf (m) (y)(k 1 ,... ,k m ) m 0 . 
,?=i 

Putting kj = F(rj)(y) we obtain the formula 

rn 

y T CF([T 1 , ...,T m ])(y) = -X] F(T j )(y) T CF([ti, .. .,Tj-i,T j+ 1 ,... ,r m ])(y), 

j =i 

which can also be written in the form 


Tri F ( T )(y) 


= - E 


f(«)(j/) T c F{v)(y) 


u,vET,vou=r 


c) With (7.1) the expression yjCy\ becomes 


B(a, y) T CB(a , y) = y T Cy + 2 y T C ^ a(r) F(r)(y) 

Ft 

hkl + fil _ 

+ E -7w-T«W«(^W(!/) CF(v)(y). 

u,ver v v ’ 


Since C is symmetric, formula (7.6) remains true if we sum over trees tx, v such that 
u o v = r. Inserting both formulas into the sum over r leads directly to (7.5). □ 
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Extension to P-Series. All the previous results can be extended to partitioned meth¬ 
ods. To find the correct conditions on the coefficients of the P-series, we use the fact 
that the numerical solution of a partitioned Runge-Kutta method (II.2.2) is a P-series 


tpi\ = ( p P i a ’(Po, q o))\ = fp 0 \ fJ2ueTP p tff) a ( u ) F ( u )(Po,qo)\ 

\ q i) \P q {a,(po,q 0 ))) \ q o) \ £„ eTP $j a ( v ) F ( v )(Po,qo) J 

(7.7) 

with coefficients a(r) given by 


a(r) 


ELi biM T ) for T e tp p 

Ei=lh<f>i( T ) for T e TP q 


(7.8) 


(see Theorem III.2.4). We assume here that the elementary differentials F(r)(p, q ) 
originate from a partitioned sytem 


p = h (p, q) , 4 = hip, q), (7.9) 

such as the Hamiltonian system (1.7). This time we multiply (4.3) by fifiu) • 4>j(v) 
(where u = [ui ,... 5 u m \ p G TP p and v = [ui,..., vi\ q G TP q ) and we sum over 
all i and j. Using the recursion (III.2.7) this yields 

s s s s 

'^2bi4>i(uov) + '^2bj(l> j (vou) = (7.10) 

i=1 j=l i=1 j=1 

where u o v = [ui ,..., u m , v] p and v o u = [vi ,..., v^u\ q . Because of (7.8), this 
implies the relation 


a(u o v) + a(v o u) = a(u) • a(v) for u G TP p , u G TP q . (7.11) 
Since <^(r) is independent of the colour of the root of r, condition (4.4) implies 

a(r) is independent of the colour of the root of r. (7.12) 


Theorem 7.2. Consider a P-series method (pi,qi) = @h(poiQo) given by (7.7), 
and a problem (7.9) having Q(p, q) — p T E q as first integral. 

i) If the coefficients a(r) satisfy (7.11) and (7.12), the method exactly conserves 
Q(p, q) and it is symplectic for general Hamiltonian systems (1.7). 

ii) If the coefficients a(r) satisfy only (7.11), the method exactly conserves 
Q(p, q) for problems of the form p = fi(q), q = / 2 (p), and it is symplectic for 
separable Hamiltonian systems where H(p , q) = T(p) + U (q). 

Proof. This is very similar to that of Theorem 7.1. If Q(p,q) = p T E q is a first 
integral of (7.9), we have /i (p, q) T E q + p T E / 2 (p, q) = 0 for all p and q. Differ¬ 
entiating m times with respect to p and n times with respect to q yields 
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0 = D^D^f 1 (p,q)(k 1 ,...,k m ,e 1 ,...,£ n ) T Eq 

+ p T ED™D™f 2 {p,q)(ki,... ,k m ,£i,... ,£ n ) (7.13) 

n 

+ ^ D™ Dg~ 1 fi(p, q)(k \,..., k m ,£i ,.. £ J+1 ,.. .^fEtj 


3 = 1 


+ lD qf 2 (p,q)(ki,...,ki-i,ki + t,...,k m ,e 1 ,...,£ n ). 


i= 1 


Putting ki = F(ui)(p,q ) with Ui G TP p , = F(vj)(p,q) with Vj G TP q , /p 


[txi,, Um, vi ,..., u n ] p and Tg = [tii,, ti m , t;i,..., u n ]g, we obtain as in part 
(b) of the proof of Theorem 7.1 that 


f ( t p )( p > «) T ^ q + p r E F(r g )(p, q) 


t{t p ) 


T (.T q ) 


(7.14) 


y- fW(p,g) r £ f(«)(fig) | y- F(u)(p, q) T E F(v)(p, q) 

a ( u ) rr(iA\ 1 ' 


r{v) 


t(u) 


t{v) 


where the sums are over u G TP p and v G TP q . 
With (7.7) the expression pj E qi becomes 

Pp(a, (p,q)) T E P q (a, (p,q)) = p T E q 


(7.15) 


U\u\ _ U\v\ 

'P - 7 -T°( u ) F ( M )(P-«) T ^a + P T E - 77 T a ( v ) F (v)(p, q) 

CJ \ Li I (J \ U ) 

V f VETPq K ’ 


t ieTP, 


P a ( u ) a ( v ) F(u)(p , g) T ^ F(t;)(p, q). 


t ieTP p ,veTP q 


r(u)cr(v) 


Condition (7.12) implies that a(r p ) = a(r q ) for the trees in (7.14). Since also \r p \ = 
|tJ and cr(r p ) = cr(r q ), two corresponding terms in the sums of the second line 
in (7.15) can be jointly replaced by the use of (7.14). As in part (c) of the proof of 
Theorem 7.1 this together with (7.11) then yields 

Pp(a, (p,q)) T EP q (a, (p,q)) =p T Eq , 

which proves the conservation of quadratic first integrals p T E q. Symplecticity fol¬ 
lows as before, because the diagram of Lemma 4.1 also commutes for general P- 
series methods. 

For the proof of statement (ii) we notice that /1 ( q) T E q + p T E f 2 (jp) = 0 im¬ 
plies that fi(q) T E q = 0 and p T E f 2 (p) = 0 vanish separately. Instead of (7.14) 
we thus have two identities: the term F(r p ) (p, q) T E q/a(r p ) becomes equal to the 
first sum in (7.14), and p T E F(r q )(p 1 q)/a(r q ) to the second sum. Consequently, 
the previous argumentation can be applied without the condition a(r p ) = a(r q ). □ 
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Second Order Differential Equations. We next consider partitioned systems of 
the particular form 

P = fi(q), q = Cp + c, (7.16) 

where C is a matrix and c a vector. Since problems of this type are second or¬ 
der differential equations q = Cfi(q), partitioned Runge-Kutta methods become 
equivalent to Nystrom methods (see Sects. II.2.3 and IV.2.3). 

An important special case are Hamiltonian systems 


p=-VU(q), q = Cp + c (7.17) 

(or, equivalently, q = —C\7U(q)). They correspond to Hamiltonian functions 

H(p,q) = ^p T Cp +c T p+U{q), (7.18) 

where the kinetic energy is at most quadratic in p (here, C is usually symmetric). 

In a P-series representation of the numerical solution, many elementary differen¬ 
tials vanish identically. Only those trees have to be considered, whose neighbouring 
vertices have different colour (the problem is separable) and whose white vertices 
have at most one son 6 (second component is linear). We denote this set of trees by 

neighbouring vertices of r have different colour 
white vertices of r have at most one son 


TN P = {r e TP p 


}, (7.19) 


and we let TN q be the corresponding subset of TP q . 

The same procedure as for partitioned methods permits us to write the symplec- 
ticity condition of Theorem 4.8 in terms of the coefficients a(r ) of the P-series. 
Assuming a( •) = a( o) = 1, the two conditions of (4.5) lead to 


a( o o u) + a(u o o) = a{u) a( o) 


for u G TN n 


(7.20) 


a(u)a(o o v) — a(u oov) = a(o o u)a(v) — a(v oou) for u, v G TN P (7.21) 


where we use the abbreviating notation 

UOOV = uo (o ov) = \u U . . . ,U m , [v] q ] p 



(7.22) 


if u = [ui ,..., u m \ p . Notice that for u, v G TN p , the trees u o o,uoov and v oou 
are in TN p , and o o u is in TN q . 


Theorem 7.3. Consider a P-series method (7.7) for differential equations (7.16) 
having Q(p, q) = p T Eq as first integral. 

If the coefficients a(r) satisfy (7.20) and (7.21), the method exactly conserves 
Q(p , q) and it is symplectic for Hamiltonian systems with H (p, q) of the form (7.18). 

6 Attention: with respect to (III.2.10) the vertices have opposite colour, because the linear 
dependence is in the second component in (7.17) whereas it is in the first component in 
(III.2.9). 
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Proof. Since the elementary differentials F(r)(p,q) vanish identically for r 0 
TN p U TN q , we can arbitrarily define a(r) for trees outside TN p U TN q with¬ 
out changing the method (throughout this proof we implicitly assume that for the 
considered trees neighbouring vertices have different colour). We shall do this in 
such a way that (7.11) holds. 

Consider first the tree uoov. There is exactly one vertex between the roots of u 
and v. Making this vertex to the root gives the tree [u, v\ q which is not in TN q . We 
define for u, v G TN p 

a([u , v] q ) := a(u)a( o o v) — a(u oov). 

Condition (7.21) shows that a([u, v] q ) is independent of permuting u and v and is 
thus well-defined. For trees that are neither in TN p U TN q nor of the form [u, v\ q 
with u, v G TN p we let a{r) = 0. This extension of a(r) implies that condition 
(7.11) holds for all trees, and part (ii) of Theorem 7.2 yields the statement. Notice 
that for problems p = fi (q) , q = f 2 (. P ) only trees, for which neighbouring vertices 
have different colour, are relevant. □ 


VI.7.2 Characterization of Symplectic P-Series (and B-Series) 


A characterization of symplectic B-series was first obtained by Calvo & Sanz-Sema 
(1994). We also consider P-series with various important special situations. 

Theorem 7.4. Consider a P-series method (7.7) applied to a general partitioned 
differential equation (7.9). Equivalent are: 

1) the coefficients a (r) satisfy (7.11) and (7.12), 

2) quadratic first integrals of the form Q(p, q) = p T E q are exactly conserved, 

3) the method is symplectic for general Hamiltonian systems (1.7). 


Proof. The implication (1)=>(2) follows from part (i) of Theorem 7.2, (2)=>(3) is a 
consequence of the fact that the symplecticity condition is a quadratic first integral of 
the variational equation (see the proof of Theorem 7.2). The remaining implication 
(3)=>(1) will be proved in the following two steps. 

a) We fix two trees u G TP p and v G TP q , and we construct a (polynomial) 
Hamiltonian such that the transformation (7.7) satisfies 


f d(pi,qi) \ T j f d{pi,qi) \ 

V dpi ) V dql ) 



v ) + a(v o 


u) 



(7.23) 


with C -=f 0 (here, pj denotes the first component of po , and q q the second compo¬ 
nent of qo). The symplecticity of (7.7) implies that the expression in (7.23) vanishes, 
so that condition (7.11) has to be satisfied. 

For given u G TP p and v G TP q we define the Hamiltonian as follows: to the 
branches of u o v we attach the numbers 3,..., \u\ + |u| + 1 such that the branch 
between the roots of u and v is labelled by 3. Then, the Hamiltonian is a sum of 
as many terms as vertices in the tree. The summand corresponding to a vertex is a 
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u o v VOu 


Fig. 7.1. Illustration of the Hamiltonian (7.24) 



product containing the factor p 7 (resp. g J ) if an upward leaving branch “j” is directly 
connected with a black (resp. white) vertex, and the factor q l (resp. p l ) if the vertex 
itself is black (resp. white) and the downward leaving branch has label “i”. Finally, 
the factors q 2 and p 1 are included in the terms corresponding to the roots of u and 
v , respectively. For the example of Fig. 7.1 we have 

H(p, q) = q 2 q 3 q 4 p 5 + p VpV + pV + q 5 + q 6 + q 7 + q 8 . (7.24) 


The components F l (r) (p, q) of the elementary differentials corresponding to 
the Hamiltonian system (with the Hamiltonian constructed above) satisfy 


F 2 (u o t?)(p, q) 
F x (v o u)(p, q) 
F 3 (u)(p,q) 
F 3 {v){p,q) 


( OV) ■ p l , 

(-1 p v ° u ) a (vou) ■ q 2 , 
(_1 ) <5 (“) ( j(m) • q 2 
(-1 ) s ^a(v) ■ p 1 , 


and for all other trees r G TP and components i we have 


dF*(r) 
dp 1 


( 0 , 0 ) 


dF i (r) 

dq 2 


( 0 , 0 ) = 0 . 


(7.25) 


In (7.25), S(r) counts the number of black vertices of r, and the symmetry coefficient 
<t(t) is that of (III.2.3). For example, cr(iz) = 1 and a(v) = 2 for the trees of 
Fig. 7.1. The verification of (7.25) is straightforward. The coefficient ( —l) <5 ( r ) is due 
to the minus sign in the first part of the Hamiltonian system (1.7), and the symmetry 
coefficient a(r) appears in exactly the same way as in the multidimensional Taylor 
formula. Due to the zero initial values, no elementary differential other than those 
of (7.25) give rise to non-vanishing expressions in (7.23). Consider for example 
the second component of F(r)(p, q) for a tree r G TP p . Since we are concerned 
with the Hamiltonian system (1.7), this expression starts with a derivative of H q 2 . 
Therefore, it contributes to (7.23) at po = qo = 0 only if it contains the factor 
H q 2 q 3 q 4 p 5 (for the example of Fig. 7.1). This in turn implies the presence of factors 
H p 3 , H p a and H ( p . Continuing this line of reasoning, we find that F 2 (r)(p, q) 
contributes to (7.23) at po = qo = 0 only if r «= u o v. With similar arguments we 
see that only the elementary differentials of (7.25) have to be considered. We now 
insert (7.25) into (7.7), and we compute its derivatives with respect to p 1 and q 2 . 
This then yields (7.23) with C = (—l) (5 ( n )+ <5 ( v )/? / l w l+l v l, and completes the proof 
concerning condition (7.11). 
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b) The necessity of condition (7.12) is seen similarly. We fix a tree r G TP p 
and we let r G TP q be the tree obtained from r by changing the colour of the root. 
We then attach the numbers 3,. .., |r| + 1 to the branches of r, and we define a 
Hamiltonian as above but, different from adding the factors q 2 and p 1 , we include 
the factor p 1 q 2 to the term corresponding to the root. For the tree r = u of Fig. 7.1 
this yields 

H(p,q) = p 1 q 2 q 3 p 4 • p 3 p 5 + q 4 + q 5 . 

With this Hamiltonian we get 

F 2 (T)(p,q) = (-l) 5 (r) cr(r) -p 1 , 

F\r)(p, q ) = (-l)^V(r) • g 2 , 


and these are the only elementary differentials contributing to the left-hand expres¬ 
sion of (7.23). We thus get 





which completes the proof of Theorem 7.4. 


□ 


Theorem 7.5. Consider a P-series method (7.7) applied to a separable partitioned 
differential equation p = /i(g), q = / 2 (p). Equivalent are: 

1) the coefficients a(r) satisfy (7.11), 

2) quadratic first integrals of the form Q(p , q) = p T E q are exactly conserved, 

3) the method is symplectic for separable Hamiltonians H (p, q) = T(p)-\-U ( q ). 

Proof. The implications (1)=K2)=>(3) follow as before from part (ii) of Theo¬ 
rem 7.2. The remaining implication (3)=>(1) is a consequence of the fact that the 
Hamiltonian constructed in part (a) of the proof of Theorem 7.4 is separable, when 
u and v have no neighbouring vertices of the same colour. □ 


Theorem 7.6. Consider a B-series method (7.1) for y = f(y). Equivalent are: 

1) the coefficients a(r) satisfy (7.4), 

2) quadratic first integrals of the form Q(y) = y T Cy are exactly conserved, 

3) the method is symplectic for general Hamiltonian systems y = J~ l VH(y). 


Proof. The implications (1)=>(2)=K3) follow from Theorem 7.1. The remaining 
implication (3)=>(1) follows from Theorem 7.4, because a B-series with coefficients 
a(r), r G T, applied to a partitioned differential equation, can always be interpreted 
as a P-series (Definition III.2.1), where a(r) := a((p(r)) for r G TP and ip : TP —> 
T is the mapping that forgets the colouring of the vertices. This follows from the 
fact that 


a(r)F(r)(y) 


( ^2ueTP p ,ip(u)=T a(u)F(u)(p , q)\ 
V E veTP q , v (v)=r a ( y ) F (v)(p, q) ) 


for r G T, because a(u) • cr(u) = a(v) • cr(v) = e(r) • |r|!. Here, y = (p, q), the 
elementary differentials F(r)(y) are those of Definition III. 1.2, whereas F(u)(p, q) 
and F(v)(p , q) are those of Table III.2.1. □ 
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Theorem 7.7. Consider a P-series method (7.7) applied to the special partitioned 
system (7.16). Equivalent are: 

1) the coefficients a(r) satisfy (7.20) and (7.21), 

2) quadratic first integrals of the form Q(p , q) = p T E q are exactly conserved, 

3) the method is symplectic for Hamiltonian systems of the form (7.17). 


Proof. The implications (1)=>(2)=K3) follow from Theorem 7.3. The remaining 
implication (3)=>(1) can be seen as follows. 

Condition (7.20) is a consequence of the the proof of Theorem 7.4, because for 
u G TN p and v = o the Hamiltonian constructed there is of the form (7.18). 

To prove condition (7.21) we have to modify slightly the definition of H(p, q ). 
We take u,v G TN p and define the polynomial Hamiltonian as follows: to the 
branches of u o o v we attach the numbers 3, •••>! + M + 2. The Hamiltonian is 
then a sum of as many terms as vertices in the tree. The summands are defined as in 
the proof of Theorem 7.4 with the only exception that to the terms corresponding to 
the roots of u and v we include the factors q 2 and q 1 , respectively, instead of q 2 and 
p 1 . This gives a Hamiltonian of the form (7.18), for which the expression 


fd(pi,qi)\ T fd(p 1 ,q 1 )\ 

' ) \ dql ) 


(7.26) 


becomes equal to 

a(u)a( o o v) — a(u oo v) — a( o o u)a(v) + a(v oou) (7.27) 

up to a nonzero constant. By symplecticity, (7.26) is zero so that also (7.27) has to 
vanish. This proves the validity of condition (7.21). □ 

VI.7.3 Irreducible Runge-Kutta Methods 

We are now able to study to what extent the conditions of Theorem 4.3 and Theo¬ 
rem 4.6 are also necessary for symplecticity. Consider first the 2-stage method 


1/2 

a 

1/2 -a 

1/2 

0 

1/2-/3 


1/2 

1/2 


The solution of the corresponding Runge-Kutta system (II. 1.4) is given by k\ = 
&2 = k , where k = f(yo + fc/2), and hence yi = yo + hk. Whatever the values of a 
and (3 are, the numerical solution of the Runge-Kutta method is identical to that of 
the implicit midpoint rule, so that it defines a symplectic transformation. However, 
the condition (4.2) is only satisfied for a = (3 Si 1/4. 

Definition 7.8. Two stages i and j of a Runge-Kutta method (II. 1.4) are said to be 
equivalent for a class (V) of initial value problems, if for every problem in (V) and 
for every sufficiently small step size we have ki = kj (ki = kj and ^ = ij for 
partitioned Runge-Kutta methods (II.2.2)). 




VI.7 Characterization of Symplectic Methods 221 


The method is called irreducible for (V) if it does not have equivalent stages. 
It is called irreducible if it is irreducible for all sufficiently smooth initial value 
problems. 

For a more amenable characterization of irreducible Runge-Kutta methods, we 
introduce an ordering on T (and on TP), and we consider the following s x oo 
matrices 

^rk = (0(t); r G T) with entries fi(r) = gi(r) given by (III. 1.13), 7 

^prk = (0(t);t G TP p ) = (0(t);t G TP q ) with entries (j>i(r) given by (III.2.7); 

observe that fa (r) does not depend on the colour of the root, 

^prk = (<K r ); r C TPp) = (0(t);t C TP*) where TP^ (resp. TP*) is the set 
of trees in TP p (resp. TP q ) whose neighbouring vertices have different colours. 

Lemma 7.9 (Hairer 1994). A Runge-Kutta method is irreducible if and only if the 
matrix <£rk has full rank s. 

A partitioned Runge-Kutta method is irreducible if and only if the matrix ^prk 
has full rank s. 

A partitioned Runge-Kutta method is irreducible for separable problems p = 
fi(q), q = f 2 (p) if cmd only if the matrix ^>p RK has full rank s. 

Proof If the stages i and j are equivalent, it follows from the expansion 

k i = T "TT 'V 1 ') F{t ){ Vo ) 

(see the proof of Theorem III. 1.4) and from the independency of the elementary 
differentials (Exercise III.3) that fa(r) = fj (r) for all r G T. Hence, the rows 
i and j of the matrix <£rk are identical. The analogous statement for partitioned 
Runge-Kutta methods follows from Theorem III.2.4 and Exercise III.6. This proves 
the sufficiency of the “full rank” condition. 

We prove its necessity only for partitioned Runge-Kutta methods applied to sep¬ 
arable problems (the other situations can be treated similarly). For separable prob¬ 
lems, only trees in TP* U TP* give rise to non-vanishing elementary differentials. 
Irreducibility therefore implies that for every pair (i,j) with i fa j there exists a tree 
r G TP* such that fa(r) fa fa(r). Consequently, a certain finite linear combina¬ 
tion of the columns of ^>p RK has distinct elements, i.e., there exist vectors £ G M°° 
(only finitely many non zero elements) and p G M s with ^p RK £ = p and pi fa pj 
for i fa j. Due to the fact that fa([r i,..., r m ]) = fa([ri]) • ... - 0i([r m ]), the com¬ 
ponentwise product of two columns of <Pp RK is again a column of ^ RRK . Continuing 
this argumentation and observing that (1,..., 1) T is a column of ^p RK , we obtain 
a matrix X such that ^p RK X = (pj~ )t,j=i * s a Vandermonde matrix. Since the ip 
are distinct, the matrix ^p RK has to be of full rank s. □ 

7 In this section we let far) G denote the vector whose elements are fa (r), i = 1 ,... ,s. 

This should not be mixed up with the value far) of (III. 1.16). 
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VI.7.4 Characterization of Irreducible Symplectic Methods 

The necessity of the condition (4.2) for symplectic Runge-Kutta methods was first 
stated by Lasagni (1988). Abia & Sanz-Serna (1993) extended his proof to parti¬ 
tioned methods. We follow here the ideas of Hairer (1994). 

Theorem 7.10. An irreducible Runge-Kutta method (II. 1.4) is symplectic if and 
only if the condition (4.2) holds. 

An irreducible partitioned Runge-Kutta method (II.2.2) is symplectic if and only 
if the conditions (4.3) and (4.4) hold. 

A partitioned Runge-Kutta method, irreducible for separable problems, is sym¬ 
plectic for separable Hamiltonians H(p , q) = T(p) -\-U (q) if and only if the condi¬ 
tion (4.3) holds. 

Proof. The “if” part of all three statements has been proved in Theorem 4.3 and 
Theorem 4.6. We prove the “only if” part for partitioned Runge-Kutta methods 
applied to general Hamiltonian systems (the other two statements can be obtained 
in the same way). 

We consider the s x s matrix M with entries ra^j = hidij + bjdji — bfij. The 
computation leading to formula (7.11) shows that for u E TP p and v E TP q 

4>(u) T M f(v) = a(u o v) + a(v o u) — a(u ) • a(v) 

holds. Due to the symplecticity of the method, this expression vanishes and we 
obtain 

^prk M ^prk = 0, 

where ^prk is the matrix of Lemma 7.9. An application of this lemma then yields 
M = 0, which proves the necessity of (4.3). 

For the vector d with components di = hi — bi we get d T< P prk = 0, and we 
deduce from Lemma 7.9 that d = 0, so that (4.4) is also seen to be necessary. □ 


VI.8 Conjugate Symplecticity 

The symplecticity requirement may be too strong if we are interested in a correct 
long-time behaviour of a numerical integrator. Stoffer (1988) suggests considering 
methods that are not necessarily symplectic but conjugate to a symplectic method. 

Definition 8.1. Two numerical methods <fih and W h are mutually conjugate , if there 
exists a global change of coordinates Xh> such that 

&h~Xh 1 °'Ph°Xh- (8.1) 

We assume that Xh(y) = y + 0(h) uniformly for y varying in a compact set. 

For a numerical solution y n+ 1 = <&h{yn)i lying in a compact subset of the 
phase space, the transformed values z n = Xh{Vn) constitute a numerical solution 
z n +1 = &h{zn) of the second method. Since y n — z n = 0(h ), both numerical 
solutions have the same long-time behaviour, independently of whether one method 
shares certain properties (e.g., symplecticity) with the other. 
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V 1.8.1 Examples and Order Conditions 

The most prominent pair of conjugate methods are the trapezoidal and midpoint 
rules. Their conjugacy has been originally exploited by Dahlquist (1975) in an in¬ 
vestigation on nonlinear stability. 

If we denote by and ^> T h the explicit and implicit Euler methods, respectively, 
then the trapezoidal rule and the implicit midpoint rule can be written as 

(see Fig. 8.1). This shows with Xh = @h/ 2 > i m plyi n g that the 

trapezoidal and midpoint rules are mutually conjugate. The change of coordinates, 
which transforms the numerical solution of one method to that of the other, is 0(h)- 
close to the identity. 



0(h 2 ) 


Fig. 8.1. Conjugacy of the trapezoidal rule and the implicit midpoint rule 


o 


o 


In fact, we can do even better. If we let $h /2 he the square root of (i.e., 
$h/ 2 °@h/ 2 = , see Lemma V.3.2), then we have (Fig. 8.1) 

$h = (^f/ 2 ) _1 °$h ° $h/2 = (®h/ 2)” 1 ° $h /2 0 ®h /2 O &h /2 ° &h/2 0 $h/2 

so that the trapezoidal and the midpoint rules are conjugate via Xh = @h /2 0 ®h/ 2 * 
Since @h /2 and ®h /2 are consistent with the same differential equation, the 
transformation Xh is O(h 2 ) -close to the identity. This shows that for every numeri¬ 
cal solution of the trapezoidal rule there exists a numerical solution of the midpoint 
rule which remains 0(h 2 )- close as long as it stays in a compact set. A single trajec¬ 
tory of the non-symplectic trapezoidal rule therefore behaves very much the same 
as a trajectory of the symplectic implicit midpoint rule. 

A Study via B-Series. An investigation of Runge-Kutta methods, conjugate to 
a symplectic method, leads us to the following weaker requirement: we say that a 
numerical method is conjugate to a symplectic method \Ph up to order r, if there 
exists a transformation Xh(y) = V + 0(h) such that 

*h{h) = (Xh 1 ° *h o Xh) (: V ) + 0(h r+1 ). (8.2) 

This implies that the error of such a method behaves as the superposition of the error 
of a symplectic method of order p with that of a non-symplectic method of order r. 
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In the following we assume that all methods considered as well as the conjugacy 
mapping Xh can be represented as B-series 

&h(y) = B(a,y), ^ h (y) = B(b, y), Xh(y) = B(c,y). (8.3) 

Using the composition formula (III. 1.38) of B-series, condition (8.2) becomes 

(ac)(r) = ( cb)(r ) for |r| < r. (8.4) 

The following results are taken from the thesis of P. Leone (2000). 

Theorem 8.2. Let @h{y) = B(a, y ) represent a numerical method of order 2. 

a) It is always conjugate to a symplectic method up to order 3. 

b) It is conjugate to a symplectic method up to order 4, if and only if 

a(.,V)-2a(.,}) = 0, a(/,/)-2a(.,}) = 0. (8.5) 

Here, we use the abbreviation a(u,v ) = a(u) • a(v) — a(u o v) — a(v o u). 

Proof The condition (8.4) allows us to express b(r) as a function of a(u ) for \u\ < 
\r\ and of c(v) for |u| < \r\ — l (use the formulas of Example III. 1.11). All we have 
to do is to check the symplecticity conditions b(u,v) M 0 for \u\ + |u| < r (see 
Theorem 7.6). 

Since the method $h is of order 2, we obtain b( •) = 1 and b(f) = 1/2. We 
arbitrarily fix c( •) = 0, so that the symplecticity condition &(•,/) = 0 becomes 
2c( /) = a( •, /). Defining c( /) by this relation proves statement (a). 

For order 4, the three symplecticity conditions 6( •, V) = K •?[[•]]) = 
&(/,/) = 0 have to be fulfilled. One of them can be satisfied by defining suit¬ 
ably c( V) + c( [[•]]); the other two conditions are then equivalent to (8.5). □ 

Theorem 8.3. Let @h(y) = B(a, y ) represent a numerical method of order 4. It is 
conjugate to a symplectic method up to order 5, if and only if 



Proof The idea of the proof is the same as in the preceding theorem. The verifica¬ 
tion is left as an exercise for the reader. □ 

Example 8.4. A direct computation shows that for the Lobatto IIIB method with 
8 = 3 we have &(/, V) = 1/144, and a(u,v) = 0 for all other pairs with 
|it| + \v\ = 5. Theorem 8.3 therefore proves that this method is not conjugate to 
a symplectic method up to order 5. 

For the Lobatto IIIA method with s = 3 we obtain a(/, V) = —1/144, 
«(/.[[•]]) = -1/288. and a(u, v) = 0 for the remaining pairs with \u\ + |u| =5. 
This time the conditions of Theorem 8.3 are fulfilled, so that the Lobatto IIIA 
method with s = 3 is conjugate to a symplectic method up to order 5 at least. 
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VI.8.2 Near Conservation of Quadratic First Integrals 


We have already met in Sect. VI.4.1 a close relationship between symplecticity and 
the conservation of quadratic first integrals. The aim of this section is to show a 
similar connection between conjugate symplecticity and the near conservation of 
quadratic first integrals. This has first been observed and proved by Chartier, Faou 
& Murua (2005) using the algebra of rooted trees. 

Let Q(y) = y T Cy (with symmetric matrix C) be a quadratic first integral of 
y = f(y), and assume that <&h{y) is conjugate to a method &h(y) that exactly con¬ 
serves quadratic first integrals (e.g., symplectic Runge-Kutta methods). This means 
that y n + i = $h{y n ) satisfies 

Xh{y n +l) T CXh(Vn+l) = Xh{y n ) T CXh(y n ), 

and the expression Q(y) = Xh(y) T GXh(y) is exactly conserved by the numerical 
solution of $h(y )-If Xh(y) = B(c, y) is a B-series, this is of the form 

Q{y)= Y, hM + W0(T,&)F(T)(y) T CF(&)(y), (8.6) 

r,tfeTU{0} 

where F(fj)(y) = y and |0| = 0 for the empty tree, and /?(0,0) = 1. We have 
the following criterion for conjugate symplecticity, where all formulas have to be 
interpreted in the sense of formal series. 

Theorem 8.5. Assume that a one-step method d?h{y) = B(a, y) leaves (8.6) invari¬ 
ant for all problems y = f(y) having Q(y) = y T Cy as first integral. 

Then, it is conjugate to a symplectic integrator F^z), i.e., there exists a transfor¬ 
mation z = Xh{y) = B(c,y) such that &h( z ) = X/i 0 ^ 0 X^ 1 (4 or equivalently, 
x Th(z) = B(c~ l ac, z ) is symplectic. 

Proof. The idea is to search for a B-series B(c,y) such that the expression (8.6) 
becomes 

Q(y) = B(c,y) T CB(c,y). 

The mapping z = Xh(y) = B(c , y) then provides a change of variables such that 
the original first integral Q(z) = z T Cz is invariant in the new variables. By Theo¬ 
rem 7.6 this then implies that ^ is symplectic. 

By Lemma 8.6 below, the expression (8.6) can be written as 

Q(y) = y T c (y + Y h w y(0)F(9 )( y )), (8.7) 

oer 

where rj(6) = 0 for \0\ < r, if the perturbation in (8.6) is of size 0(h r ). Using the 
same lemma once more, we obtain 
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A comparison of the coefficients in (8.7) and (8.8) uniquely defines c(6) in a recur¬ 
sive manner. We have c(6) = 0 for |0| < r, so that the transformation z = B(c, y ) 
is 0(h r ) close to the identity. □ 

The previous proof is based on the following result. 

Lemma 8.6. Let Q(y) = y T Cy (with symmetric matrix C) be a first integral of 
V = f(y)- Then, for every pair of trees r, 0 G T, we have 

F(T)(y) T CF(#)(y) = y T C ( ]T K T ,#(0)F(9)(yj). 

oer 

This sum is finite and only over trees satisfying \6\ = |r| + \d\. 

Proof By definition of a first integral we have y T C f(y) = 0 for all y. Differentia¬ 
tion with respect to y gives 

f(y) T C k + y T C f'(y)k = 0 for all k. (8.9) 

Putting k = F(P)(y), this proves the statement for r = •. 

Differentiating once more yields 

C f'WfC k + FC f'(y)k + y T C f"(y)(k, £) = 0. 

Putting £ = f(y) and using (8.9), we get the statement for r = /. With £ = 
F(ri)(y) we obtain the statement for r = [rf provided that it is already proved for 
r\. We need a further differentiation to get a similar statement for r = [n, 72 ], etc. 
The proof concludes by induction on the order of r. □ 

Partitioned Methods. This criterion for conjugate symplecticity can be extended 
to partitioned P-series methods. For partitioned problems 

p = fi{p,q)> q = f 2 {p,q) (8.10) 

we consider first integrals of the form L(p 1 q) = p T E q, where E is an arbitrary 
constant matrix. If ^(p, q) is conjugate to a method that exactly conserves L(p, q), 
then it will conserve a modified first integral of the form 

L(p,q)= h^+W[3(T,V)F(T)(p,q) T EF(V)(p,q), (8.11) 

reTP p u{(D p }^eTP q u{(Dq} 

where /?(0 p , 0 g ) = 1, F(0 p )(p, q) = p, F($ q )(p, q) = q. We first extend Lemma 8.6 
to the new situation. 

Lemma 8.7. Let L(p, q) = p T E q he a first integral of (8.10). Then, for every pair 
of trees r G TP p , 0 G TP q , we have 

F(T)(p,q) T EF(V)(p,q) = P T e(J2 

eeTP q 

+ ( £ KT F0)F(e)(p,q)YEq. 

oeTPp 

These sums are finite and only over trees satisfying \0\ = |r| + |^|. 


( 8 . 12 ) 
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Proof. Since L(p, g) = p T E q is a first integral of the differential equation, we 
have /i (p, q) T E q+p T E / 2 (p, q) = 0 for all p and q. As in the proof of Lemma 8.6 
the statement follows from differentiation of this relation. □ 

Theorem 8.8. Assume that a partitioned one-step method <^(p, q) = P{a, (p, q)) 
leaves (8.11) invariant for all problems (8.10) having L(p,q) = p T E q as first 
integral. 

Then it is conjugate to a symplectic integrator v), i.e., there is a transfor- 

mation (u,v) = Xh(p,q ) = P(c,(p,q)) such that F h (u,v) = X^o^ox^ 1 ^, v), 
or equivalently, $p(?i, v) = P(c _1 ac, (u, v)) is symplectic. 

Proof. We search for a P-series P(c, (p, q)) = (P p (c, (p, g)), P g (c, (p, g))) T such 
that the expression (8.11) can be written as 

L(p, q) = P p (c, (p, q)) T E P q (c, (p, q)). 

As in the proof of Theorem 8.5 the mapping (u, v) = Xh(p, o) = P( c , (p? o)) then 
provides the searched change of variables. 

Using Lemma 8.7 the expression (8.11) becomes 

L(p,q) =p T E(q + E h w rj(6)F(6)(p,q)) + ( E h m p(e)F{9)(p, q)f E q. 

6eTP q oeTP p 

Also P p (c, (p, q)) T E P q (c , (p, q)) can be written in such a form, and a comparison 
of the coefficients yields the coefficients c(r) of the P-series P(c, (p, q)) in a recur¬ 
sive manner. We again have that P(c, (p, g)) is 0(h r ) close to the identity, if the 
perturbation in (8.11) is of size 0(h r ). □ 

The statement of Theorem 8.8 remains true in the class of second order differ¬ 
ential equations q = /i(g), i.e., p = /i(p), g = p. 


VI.9 Volume Preservation 

The flow (p t of a Hamiltonian system preserves volume in phase space: for every 
bounded open set Q C M 2d and for every t for which <pt{y) exists for all y G i?, 

vol(p t (!?)) = vol(i?) , 

where vol(f2) = f Q dy. This identity is often referred to as Liouville’s theorem. It 
is a consequence of the transformation formula for integrals and the fact that 

det ^ = l for all t and p, (9.1) 

dy 

which follows directly from the symplecticity and po = id. The same argument 
shows that every symplectic transformation, and in particular every symplectic in¬ 
tegrator applied to a Hamiltonian system, preserves volume in phase space. 

More generally than for Hamiltonian systems, volume is preserved by the flow 
of differential equations with a divergence-free vector field: 
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Lemma 9.1. The flow of a differential equation y = f(y) in W 1 is volume-preserving 
if and only if di vf(y) = 0 for all y. 

Proof The derivative Y(t) = (yo) is the solution of the variational equation 

Y = A(t)Y , Y( 0) = I, 

with the Jacobian matrix A(t) = f'(y(t)) at y(t) = ipt(yo)- From the proof of 
Lemma IV.3.1 we obtain the Abel-Liouville-Jacobi-Ostrogradskii identity 

^ det Y = trace A(t) • det Y. (9.2) 

Note that here trace A(t) = di wf(y(t)). Hence, det Y(t) = 1 for all t if and only if 
di wf(y(t)) = 0 for all t. Since this is valid for all choices of initial values yo, the 
result follows. □ 

Example 9.2 (ABC Flow). This flow, named after the three independent authors 
Arnold, Beltrami and Childress, is given by the equations 

x = A sin z + C cos y 

y = Bsinx + Acosz (9.3) 

i = C sin y + B cos x 

and has all diagonal elements of f identically zero. It is therefore volume preserv¬ 
ing. In Arnold (1966, p. 347) it appeared in a footnote as an example of a flow with 
rot/ parallel to /, thus violating Arnold’s condition for the existence of invariant 
tori (Arnold 1966, p. 346). It was therefore expected to possess interesting chaotic 
properties and has since then been the object of many investigations showing their 
non-integrability (see e.g., Ziglin (1996)). We illustrate in Fig. 9.1 the action of this 
flow by transforming, in a volume preserving manner, a ball in M 3 . We see that, 
very soon, the set is strongly squeezed in one direction and dilated in two others. 
The solutions thus depend in a very sensitive way on the initial values. 

Volume-Preserving Numerical Integrators. The question arises as to whether 
volume-preserving integrators can be constructed for every differential equation 
with volume-preserving flow. Already for linear problems, Lemma IV.3.2 shows 
that no standard method can be volume-preserving for dimension n > 3. Never¬ 
theless, positive answers were found by Qin & Zhu (1993), Shang (1994a, 1994b), 
Feng & Shang (1995) and Quispel (1995). In the following we present the approach 
of Feng & Shang (1995). The key is the following result which generalizes and 
reinterprets a construction of H. Weyl (1940) forn = 3. 

Theorem 9.3 (Feng & Shang 1995). Every divergence-free vector field f : M n —> 
M n can he written as the sum ofn — 1 vector fields 

f = /l,2 + /2,3 + • . • + fn-i,n 
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Fig. 9.1. Volume preserving deformation of the ball of radius 1, centred at the origin, by the 
ABC flow; A = 1/2, B = C = 1 


where each fk,k+i i s Hamiltonian in the variables (ykAJk+i) : there exist functions 
Hk,k +1 : M n —» M such that 


fk,k+1 = (0, . . . , 0, 


0H k , k + 1 OHk,k+l n n\T 


Oyk • x ’ 0y k 


,0,... ,0) J . 


Proof. In terms of the components of / = (/i,..., f n ) T , the functions H k , k +1 
must satisfy the equations 

, _ 0H h2 _ 3H-2 8H 2>3 

/1 o 5 «/2 


^2/2 


%1 %3 


^ _ 0H n — 2,n—l 0H n — dH n — 1 5 - 

Jn—1 ^ ^ 5 Jn 


5y„- 2 




%i-l 


We thus set 


fi dy 2 


and for k = 2 ,..., n — 2 


Hi, 2 =- r 

Jo 
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It remains to construct H n - i ?n from the last two equations. We see by induction 
that for k < n — 2, 

d 2 H k 

,k +1 / dfl , . & fk \ 

dykdyk+i V dyi dy k ) ’ 

and hence the integrability condition for 

d fdH n . 2,,-J x df n 

oy n -i ' oy n -2 ' oy n 


reduces to the condition div/ = 0, which is satisfied by assumption. H n - i ?n can 
thus be constructed as 


Hn—l,r 



dH n —2,n—l 
dy n -2 


\ rUn—i 

fn—ij dy n ~\~ J fn\y n =o dy n —i , 


which completes the proof. 


□ 


The above construction also shows that 


fk,k+l — (0, • • • , 0, fk + 9k > —^/c+l, 0, • • • , 0) 


with 




^/c+l 


for 1 < k < n — 2, and gq = 0 and g n = —f n . 

With the decomposition of Lemma 9.3 at hand, a volume-preserving algorithm 
is obtained by applying a splitting method with symplectic substeps. For example, 
as proposed by Feng & Shang (1995), a second-order volume-preserving method is 
obtained by Strang splitting with symplectic Euler substeps: 


<Ph 




[1,2] 

h/2 


* 


o . 


. . o <£> 


\n— l,n]* 
h/2 


o <£> 


[n—l,n\ 

h/2 


o . 


. . o 


[1,2] 

/i/2 


where ^ ' s a symplectic Euler step of length h/2 applied to the system with 

right-hand side f k , k +i> and * denotes the adjoint method. In this method, one step 
y = <l>h(y) is computed component-wise, in a Gauss-Seidel-like manner, as 


Vi =i/i + ^fi(yi,y2,---,y n ) 

Vk = Vk + ■ ■ ■ ,y k ,yk+i,■ ■ ■ ,y n ) + ^gk\% k k for k = 2,...,n-i 

y n = yn + ^fn(lh,---,y n -i,yn) (9.4) 


with g k \ y y k k = gk(Vi,- ■ ■, y k ,Vk+ 1 , • • •, Vn) ~ 9k(y i, • • • ,Vk-i> Vk, • • •, Vn), and 
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Vn =V n + ,Vn) 

Vk = y k + ^fk(y 1 ,---,y k ,yk+i---,yn) - \~9k\f k for k = n- 1 ,. ,2 

yi = yi + ^fi(yi,y2r‘--,y n ) (9.5) 

with 9k\% = 9k{yi,---,y k ~t,yky-,yn) - g k {yi,---,y k ,yk+i,---,y n )- The 

method is one-dimensionally implicit in general, but becomes explicit in the par¬ 
ticular case where dfk/dyk = 0 for all k. 

Separable Partitioned Systems. For problems of the form 

y = f(z), * = g{y) (9.6) 


with y G M m , 2 G M n , the scheme (9.4) becomes the symplectic Euler method, (9.5) 
its adjoint, and its composition the Lobatto IIIA - IIIB extension of the Stormer- 
Verlet method. Since symplectic explicit partitioned Runge-Kutta methods are com¬ 
positions of symplectic Euler steps (Theorem VI.4.7), this observation proves that 
such methods are volume-preserving for systems (9.6). This fact was obtained by 
Suris (1996) by a direct calculation, without interpreting the methods as composi¬ 
tion methods. The question arises as to whether more symplectic partitioned Runge- 
Kutta methods are volume-preserving for systems (9.6). 

Theorem 9.4. Every symplectic Runge-Kutta method with at most two stages is 
volume-preserving for systems (9.6) of arbitrary dimension. 

Proof, (a) The idea is to consider the Hamiltonian system with 

H(u,v,y,z ) = u T f(z)+v T g(y), 

where (u,v) are the conjugate variables to (y,z). This system is of the form 

V = f(z) u = -g'(y) T v 

z = g{y) v = -f(z) T u. 


Applying the Runge-Kutta method to this augmented system does not change the 
numerical solution for (y,z). For symplectic methods the matrix 


/ d(yi,zi,Mi,ttih = M = (R 0 

\d{y 0 ,z 0 ,u 0 ,v 0 )J \S T 


(9.8) 


satisfies M T JM = J which implies RT t = I. Below we shall show that detT = 
det R. This yields det R = 1 which implies that the method is volume preserving. 

(b) One-stage methods. The only symplectic one-stage method is the implicit 
midpoint rule for which R and T are computed as 
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(i-^E^R = I+\E l (9.9) 

= I-\ El, (9.10) 


where E\ is the Jacobian of the system (9.6) evaluated at the internal stage value. 
Since 


E x 


( 0 f(z 1/2 )\ 

\ s'(2/1/2) o ) ’ 


a similarity transformation with the matrix D = diag(/, —I) takes E\ to —E\. 
Hence, the transformed matrix satisfies 




A comparison with (9.9) and the use of det X T = det X proves det R = det T for 
the midpoint rule. 

(c) Two-stage methods. Applying a two-stage implicit Runge-Kutta method to 
(9.7) yields 

(I — hanEi —hai 2 E 2 \ f R\\ _ f l\ 

\ —ha 2 \E\ I — ha 22 E 2 ) \R2 ) \7y ’ 

where Ri is the derivative of the (y, z) components of the ith stage with respect to 
(?/o > z o), an ^ Ei is the Jacobian of the system (9.6) evaluated at the Ah internal stage 
value. From the solution of this system the derivative R of (9.8) is obtained as 


R — 7+ {h\E \, b 2 E 2 ) 


f I — hanEi 
\ —ha 2 iEi 


-ha 12 E 2 \ 1 fl\ 

I — ha 22 E 2 ) \l)- 


With the determinant identity 

det(U) det(X - WU^V) = det ( ^ x) = det ( X ) det ( U ~ VX^W), 


which is seen by Gaussian elimination, this yields 

det (7 0 7- h((A - lb T ) 0 I) E) 
dot R — 7 - - 7 , 

det(70 7-h(A0 7)E) 

where A and b collect the Runge-Kutta coefficients, and E = blockdiag (Ei,E 2 ). 
For D~ 1 TD we get the same formula with E replaced by E T . If A is an arbitrary 
2x2 matrix, it follows from block Gaussian elimination that 

det (7 0 7 — h(A <g> 7) E) = det (/ 0 / — h(A 0 I) E T ), (9.11) 


which then proves det R = det T. Notice that the identity (9.11) is no longer true 
in general if A is of dimension larger than two. □ 
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We are curious to see whether Theorem 9.4 remains valid for symplectic Runge- 
Kutta methods with more than two stages. For this we apply the Gauss methods with 
8 = 2 and s = 3 to the problem 

x = sin z, y = cosz , i = sinp + cosx (9.12) 

with initial value (0,0,0). We show in Fig. 9.2 the determinant of the derivative of 
the numerical flow as a function of time. Only the two-stage method is volume¬ 
preserving for this problem which is in agreement with Theorem 9.4. 


VI. 10 Exercises 

1. Let a and (3 be the generalized coordinates of the double 
pendulum, whose kinetic and potential energies are 

T = ^l + yD + ^ixl + yl) 

U = migyi + m 2 gy 2 - 

Determine the generalized momenta of the correspond¬ 
ing Hamiltonian system. 

2. A non-autonomous Hamiltonian system is given by a time-dependent Hamil¬ 
tonian function H(p,q,t) and the differential equations 

P = -H q (p,q,t), q = H p (p, q, t). 

Verify that these equations together with e = — H t (p , g, t) and i = 1 are the 
canonical equations for the extended Hamiltonian H(p,q) = H(p,q,t) + e 
with p = (p, e) and q = (g, t). 

3. Prove that a linear transformation A : M 2 3 4 —> M 2 is symplectic, if and only if 
det Ami. 

4. Consider the transformation (r, p) i—> (p, g), defined by 

p = ^(r) cos <p, g = ^(r) sinp. 

For which function ip(r) is it a symplectic transformation? 
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5. Prove that the definition (2.4) of Q{M) does not depend on the parametrization 
ip, i.e., the parametrization = ip o a, where a is a diffeomorphism between 
suitable domains of M 2 , leads to the same result. 

6. On the set U = {(p, q ); p 2 + q 2 > 0} consider the differential equation 



( 10 . 1 ) 


Prove that 

a) its flow is symplectic everywhere on U ; 

b) on every simply-connected subset of V the vector field (10.1) is Hamiltonian 
(with H(p, q) = —Im log(p + iq) + Const)', 

c) it is not possible to find a differentiable function H : V —> M such that (10.1) 
is equal to J~ 1 VH(p , q) for all (p, q) G U. 

Remark. The vector field (10.1) is locally (but not globally) Hamiltonian. 

7. (Burnton & Scherer 1998). Prove that all members of the one-parameter family 
of Nystrom methods of order 2s, constructed in Exercise III.9, are symplectic 
and symmetric. 

8. Prove that the statement of Lemma 4.1 remains true for methods that are for¬ 
mally defined by a B-series, $h(y) = B(a,y). 

9. Compute the generating function S 1 (P , q, h ) of a symplectic Nystrom method 
applied to q = U(q). 

10. Find the Hamilton-Jacobi equation (cf. Theorem 5.7) for the generating func¬ 
tion S 2 (p, Q ) of Lemma 5.3. 

11. (, Jacobi’s method for exact integration). Suppose we have a solution S(q,Q,t,a) 

of the Hamilton-Jacobi equation (5.16), depending on d parameters a\ ,..., ad 
such that the matrix ( d ®. qq ■ ) invertible. Since this matrix is the Jacobian 

of the system 

dS 

-—=0 i = l,...,d, (10.2) 

OOL{ 

this system determines a solution path Qi,...,Q q which is locally unique. In 
possession of an additional parameter (and, including the partial derivatives 
with respect to t, an additional row and column in the Hessian matrix condi¬ 
tion), we can also determine Qj(t) as function of t. Apply this method to the 
Kepler problem (1.2.2) in polar coordinates, where, with the generalized mo¬ 
menta^ = r, Pp = r 2 ip, the Hamiltonian becomes 


H 



M 

r 


and the Hamilton-Jacobi differential equation (5.16) is 



Solve this equation by the ansatz S(t,r,ip) = 6 \ (t) + 62 (r) + 63 (ip) (separation 
of variables). 
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Result. One obtains 

S = / yj 2 ol \ r 2 + 2 Mr — — aqL 

Putting, e.g., dS/da 2 = 0, we obtain <p = arcsin by evaluating 

an elementary integral. This, when resolved for r, leads to the elliptic movement 
of Kepler (Sect. 1.2.2). This method turned out to be most effective for the exact 
integration of difficult problems. With the same ideas, just more complicated 
in the computations, Jacobi solves in “lectures” 24 through 30 of (Jacobi 1842) 
the Kepler motion in M 3 , the geodesics of ellipsoids (his greatest triumph), the 
motion with two centres of gravity, and proves a theorem of Abel. 

12. ( Chan’s Lobatto IIIS methods.) Show that there exists a one-parameter family 
of symplectic, symmetric (and A-stable) Runge-Kutta methods of order 2s — 2 
based on Lobatto quadrature (Chan 1990). A special case of these methods can 
be obtained by taking the arithmetic mean of the Lobatto IIIA and Lobatto IIIB 
method coefficients (Sun 2000). 

Hint. Use the IL-transformation (see Hairer & Wanner (1996), p. 77) by putting 
X s , s -i = —X s -\\ s an arbitrary constant. 

13. For a Hamiltonian system with associated Lagrangian L(q,q) = \q T Mq — 
U(q), show that every first integral /(p, q ) = p T a(q) resulting from Noether’s 
Theorem has a linear a(q) = Aq + c with skew-symmetric MA. 

Hint, (a) It is sufficient to consider the case M = I. 

(b) Show that a'(q) is skew-symmetric. 

(c) Let dij(q) = | ^(q). Using the symmetry of the Hessian of each compo¬ 
nent di(q), show that a^ (g) does not depend on (p, qj, and is at most linear in 
the remaining components q^. With the skew-symmetry of a'(q), conclude that 

a'(q) = Const. 

14. Consider the unconstrained optimal control problem 


C{q(T )) —> min 

q(t) = f(q(t),u(t)), q( 0) = q 0 


(10.3) 


on the interval [0, T], where the control function is assumed to be continuous. 
Prove that first-order necessary optimality conditions can be written as 

q(t) = V p H(p(t),q(t),u(t)), q( 0) = q 0 

Pit). = — V q H(p{t), q(t),u{t )), p(T) = V q C(q(T)) (10.4) 

0 = V u H(p(t),q(t),u(t)), 

where the Hamiltonian is given by 

H(p,q,u ) =p T f(q,u ) 


(we assume that the Hessian V^i7(p, g, u) is invertible, so that the third relation 
of (10.4) defines u as a function of (p, q)). 
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Hint. Consider a slightly perturbed control function u(t) + eSu(t ) 9 and let 
q(t)+e5q(t) + 0(e 2 ) be the corresponding solution of the differential equation 
in (10.3). With the function p(t) of (10.4) we then have 

C'(q(T))6q(T) = j ^(p(t) T Sq(t)jdt = J p(t ) T /„(.. )5u{t)dt. 

The algebraic relation of (10.4) then follows from the fundamental lemma of 
variational calculus. 

15. A Runge-Kutta discretization of the problem (10.3) is 

C(qN ) —► min 

<7n+l = qn-\-h^2 8 i=l b if(QnuU ni ) (10.5) 

Qni Qn ~l~ h y j3 _ x CLjjf (Qnj •> Unj ) 

with n = 0,..., N — 1 and h = T/N. We assume bi ^ 0 for ah i. Introducing 
suitable Lagrange multipliers for the constrained minimization problem (10.5), 
prove that there exist p ni P n i such that the optimal solution of (10.5) satisfies 
(Hager 2000) 


Qn+l — Qn H~ h TV-, bjS7 n H{P n i, Q n j , Uni) 

Qni Qn 4“ h TV, CLij V p H(P n j , Q n j , U n j ) 

Pn+l = Pn h T~T— x bjW gH (P n j, Quit Uni ) (10.6) 

Pni — Pn h TTj —i ^ qP(Pnj •> Qnj •> Unj ) 

0 ^ uH(P n i, QniiU n i) 


with pn = VgC(gAr) and given initial value go> where the coefficients bi and 
aij are determined by 


bi — bi, biCiij ~\~ bjCLji — bibj. (10.7) 

Consequently, (10.6) can be considered as a symplectic discretization of (10.4); 
see Bonnans & Laurent-Varin (2006). 

16. (Hager 2000). For an explicit s-stage Runge-Kutta method of order p = s and 
bi 7 ^ 0, consider the partitioned Runge-Kutta method with additional coeffi¬ 
cients bi and defined by (10.7). Prove the following: 

a) For p = s = 3, the partitioned method is of order 3 if and only if C 3 = 1. 

b) For p = s = 4, the partitioned method is of order 4 without any restriction. 



Chapter VII. 

Non-Canonical Hamiltonian Systems 


We discuss theoretical properties and the structure-preserving numerical treatment 
of Hamiltonian systems on manifolds and of the closely related class of Poisson 
systems. We present numerical integrators for problems from classical and quantum 
mechanics. 


VII. 1 Constrained Mechanical Systems 

Constrained mechanical systems form an important class of differential equations 
on manifolds. Their numerical treatment has been extensively investigated in the 
context of differential-algebraic equations and is documented in monographs like 
that of Brenan, Campbell & Petzold (1996), Eich-Soellner & Fiihrer (1998), Hairer, 
Lubich & Roche (1989), and Chap. VII of Hairer & Wanner (1996). We concentrate 
here on the symmetry and/or symplecticity of such numerical integrators. 


VII. 1.1 Introduction and Examples 


Consider a mechanical system described by position coordinates gi,..., g^, and 
suppose that the motion is constrained to satisfy g(q ) = 0 where g : —> M m with 

m < d. Let T(g, q) = ^ q T M(q)q be the kinetic energy of the system and U ( q ) its 
potential energy, and put 

L{q,q) =T(q,q)-U(q) - g(q) T A, (1.1) 


where A = (Ai,..., A m ) T consists of Lagrange multipliers. The Euler-Lagrange 
equation of the variational problem for L(g, g) dt is then given by 


d / dL\ dL 
dt V dq ) dq 

Written as a first order differential equation we get 


q = v 

M(q)v = f(q,v) - G(q) T \ (1.2) 

0 = g{q), 

where f(q,v) = --j^(M(q)v)v + V q T(q,v) - \7 q U(q) and G{q) = §|(g). 
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Example 1.1 (Spherical Pendulum). We denote by gi, g 2 , Q 3 the Cartesian coor¬ 
dinates of a point with mass m that is connected with a massless rod of length t 
to the origin. The kinetic and potential energies are T = ^(#1 + #2 + gf) and 
V = mgqs , respectively, and the constraint is the fixed length of the rod. We thus 
get the system 


Qi 

= Vi 

mv\ = 

— 2 giA 

Q2 

= V 2 

mv 2 = 

- 2 q 2 X 

<73 

= V3 

mv 3 = 

-mg - 2 q 3 X 


(1.3) 


0 = qi+q 2 2 +q 2 3 -i 2 . 


The physical meaning of A is the tension in the rod which maintains the constant 
distance of the mass point from the origin. 


Existence and Uniqueness of the Solution. A standard approach for studying 
the existence of solutions of differential-algebraic equations is to differentiate the 
constraints until an ordinary differential equation is obtained. Differentiating the 
constraint in ( 1 . 2 ) twice with respect to time yields 


0 = G(q)v and 0 = g"(q)(v, v) + G(q) v. (1.4) 


The equation for v in (1.2) together with the second relation of (1.4) constitute a 
linear system for v and A, 


M(q) G(q) T 
G(q ) 0 


f(q,v) 

-g"(q)(v,v) 


(1.5) 


Throughout this chapter we require the matrix appearing in (1.5) to be invertible for 
q close to the solution we are looking for. This then allows us to express v and A as 
functions of (g, v). Notice that the matrix in (1.5) is invertible when G(q) has full 
rank and M(q) is invertible on ker G(q) = {h \ G(q)h = 0 }. 

We are now able to discuss the existence of a solution of (1.2). First of all, 
observe that the initial values go, ^ 0 , Ao cannot be arbitrarily chosen. They have to 
satisfy the first relation of (1.4) and Ao = A(go, vo), where A(g, v) is obtained from 
(1.5). In the case that go,^o,Ao satisfy these conditions, we call them consistent 
initial values. Furthermore, every solution of (1.2) has to satisfy 


<i = v, v = v(q, v), ( 1 . 6 ) 

where u(g, v) is the function obtained from (1.5). It is known from standard theory 
of ordinary differential equations that (1.6) has locally a unique solution. This solu¬ 
tion (g(£), ?;(£)) together with A (t) := A (q(t),v(t)) satisfies (1.5) by construction, 
and hence also the two differential equations of (1.2). Integrating the second relation 
of (1.4) twice and using the fact that the integration constants vanish for consistent 
initial values, proves also the remaining relation 0 = g(q) for this solution. 
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Formulation as a Differential Equation on a Manifold. We denote by 

Q = {q;g(q) = 0} (1.7) 

the configuration manifold , on which the positions q are constrained to lie. The 
tangent space at q G Q is T q Q = {v ; G(q)v = 0}. The equations (1.6) define thus 
a differential equation on the manifold 

TQ = {(q,v ); qe Q, v e T q Q} = {(q,v ); g(q) = 0 , G(q)v = 0 }, ( 1 . 8 ) 

the tangent bundle of Q. Indeed, we have just shown that for initial values (go, ^o) C 
TQ (i.e., consistent initial values) the problems (1.6) and (1.2) are equivalent, so that 
the solutions of ( 1 . 6 ) stay on TQ. 

Reversibility. The system (1.2) and the corresponding differential equation (1.6) 
are reversible with respect to the involution p(q,v) = (q,—v), if f{q,—v) = 
f(q,v). This follows at once from Example V.1.3, because the solution v(q,v) of 
(1.5) satisfies v(q, — v ) — v(q, v ) 

For the numerical solution of differential-algebraic equations “index reduction” 
is a very popular technique. This means that instead of directly treating the prob¬ 
lem ( 1 . 2 ) one numerically solves the differential equation ( 1 . 6 ) on the manifold 
M. Projection methods (Sect. IV.4) as well as methods based on local coordinates 
(Sect. IV.5) are much in use. If one is interested in a correct simulation of the re¬ 
versible structure of the problem, the symmetric methods of Sect. V.4 can be ap¬ 
plied. Here we do not repeat these approaches for this particular situation, instead 
we concentrate on the symplectic integration of constrained systems. 

VII.1.2 Hamiltonian Formulation 

In Sect. VI. 1 we have seen that, for unconstrained mechanical systems, the equa¬ 
tions of motion become more structured if we use the momentum coordinates 
p — = M(q) q in place of the velocity coordinates v — q. Let us do the same 

for the constrained system (1.2). As in the proof of Theorem VI. 1.3 we obtain the 
equivalent system 

<1 = H p (p,q) 

v = —H q (p,q) — G(q) T X (1.9) 

0 = g{q), 

where 

H(p,q) = lp T M(q)- 1 p+U(q) (1.10) 

is the total energy of the system; H p and H q denote the column vectors of partial 

derivatives. Differentiating the constraint in (1.9) twice with respect to time, we get 

0 = G(q)H p (p,q), ( 1 . 11 ) 

° = {^{ G ( q ) H P&’T) H p(p><l) * L -G(q) H pp(p,q)(Hq(p,q) + G(q) T \),(l.l2) 
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and assuming the matrix 

G(q)H pp (p , q)G(q) T is invertible, (1.13) 

equation (1.12) permits us to express A in terms of (p, q). 

Formulation as a Differential Equation on a Manifold. Inserting the so-obtained 
function A(p, q) into (1.9) gives a differential equation for (p, q) on the manifold 

M = {( p,q ); 5 ( 9 ) = 0, G(q)H p (p, q) = 0}. (1.14) 

As we will now see, this manifold has a differential-geometric interpretation as 
the cotangent bundle of the configuration manifold Q = {g; g(q) = 0}. The 
Lagrangian for a fixed q G Q is a function on the tangent space T q Q , i.e., 
L(g, •) : TgQ —> M. Its (Frechet) derivative evaluated at q G T q Q is therefore a lin¬ 
ear mapping d q L(q , g) : T g Q —> M, or in other terms, d q L(q , g) is in the cotangent 
space T* Q. Since the duality is such that ( d q L(q , g), u) = ^(g, g)u for u G T g Q, 
condition (1.13) ensures that the Legendre transform g i—> p = d q L(q , g) is an in¬ 
vertible transformation between T q Q and T*Q. We can therefore consider T*Q as 
a subspace of if every p G T* Q is identified with ^ (g, g) T = M(q)q G for 
the unique g G T q Q for which p = d q L(q,q) holds. With this identification, 

T^Q = {M(q)q ] qeT g Q}, 

and the duality is given by (p, v) = p T u for p G T* Q and v G T^Q. We thus have 
p = M(q)q G T*Q if and only if q = M(g) _1 p = H p (p,q) G T q Q. Since the 
tangent space at g G Q is X^Q = {g; G(q)q = 0}, we obtain that 

p e T^Q if and only if G(q)H p (p , g) = 0. 

Denoting by T*Q = {(p, g); g G 2, p G T*Q} the cotangent bundle of Q, we 
thus see that the constraint manifold Ad of (1.14) equals 

Ad = T*Q. (1.15) 

The constrained Hamiltonian system (1.9) with Hamiltonian (1.10) can thus be 
viewed as a differential equation on the cotangent bundle T* Q of the configura¬ 
tion manifold Q. 

In the following we consider the system (1.9)—(1.12) with (1.13) where H(p, q) 
is an arbitrary smooth function. The constraint manifold is then still given by (1.14). 
The existence and uniqueness of the solution of (1.9) can be discussed as before. 

Reversibility. It is readily checked that the system (1.9) is reversible if H(—p , q) = 
JT(p, g). This is always satisfied for a Hamiltonian (1.10). 

Preservation of the Hamiltonian. Differentiation of H (p(t), q(t)) with respect to 
time yields 

~H T V H q - H t v G t X + H T q H p 
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with all expressions evaluated at (p(t),q(t)). The first and the last terms cancel, 
and the central term vanishes because GH p = 0 on the solution manifold. Conse¬ 
quently, the Hamiltonian H(p , q ) is constant along solutions of (1.9). 

Symplecticity of the Flow. Since the flow of the system (1.9) is a transformation 
on Ad, its derivative is a mapping between the corresponding tangent spaces. In 
agreement with Definition VI.2.2 we call a map ip : Ad — » Ad symplectic if, for 
every x = (p, q) G Ad, 

3<p'(x) T J<p'(x)&=3Jt2 for all & ,&eT x M. (1.16) 

If ip is actually defined and continuously differentiable in an open subset of R 2d 
that contains Ad , then pfx) in the above formula is just the usual Jacobian matrix. 
Otherwise, some care is necessary in the interpretation of (1.16): is the tangent 
map given by the directional derivative ip f (x)£ := (d/dr )| r =o p{l{ T )) f° r £ £ 
T X M ., where 7 is a path on Ad with 7 ( 0 ) = x, 7 ( 0 ) = £. The expression 
in (1.16) should then be interpreted as (ip'(x)^ i) T . 

Theorem 1.2. Let H(p, q) and g(q) be twice continuously differentiable. The flow 
(ft : Ad Ad of the system (1.9) is then a symplectic transformation on Ad, i.e., it 
satisfies (1.16). 

Proof. We let x = (p,q), so that the system (1.9) becomes x = J -1 (V H(x) + 
^2 i A i(x)X? gi(x)) , where A fix) and gflx) are the components of \(x) and g(x), 
and X(x) is the function obtained from (1.12). The variational equation of this sys¬ 
tem, satisfied by the directional derivative SP = ip'fixf)^, with xo = (po,qo), reads 

m m 

* = J' 1 f 2 H(x) + A i(aOV 2 Pi(aO + ^ 

i=l i=1 

A direct computation, analogous to that in the proof of Theorem VI.2.4, yields for 

£i? £2 C T X M 

1 / \ m 

-^LT<Pt( x o) T Jv't(xo)&) = ... = y2ZTvt( x o) Tv 9i(. x W\i(x) T <p' t (xo)& 

' ' i= 1 

m 

- 527^t( a; o) T VA i (a;)V5( i (a;) T ^(a;o)6- (1-17) 

i= 1 


Since gi(<pt(x o)) = 0 for xq g Ad, we have V gi(x) T ip' t (x 0)^2 = 0 and the same 
for £ 1 , so that the expression in (1.17) vanishes. This proves the symplecticity of the 
flow on Ad. □ 


Differentiating the constraint in (1.9) twice and solving for the Lagrange multi¬ 
plier from ( 1 . 12 ) (this procedure is known as “index reduction” of the differential- 
algebraic system) yields the differential equation 

q = H p (p,q), p = —H q (p, q) - G(q) T \(p,q), 


(1.18) 
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Fig. 1.1. Numerical solution of the symplectic Euler method applied to (1.18) with H (p, q ) = 
\{p\ + pi + pi) + q 3 , g(q) = ql + ql + qi ~ 1 (spherical pendulum); initial value q 0 = 
(0, sin(O.l), — cos(O.l)), po — (0.06, 0, 0), step size h = 0.003 for method “sE” (without 
projection) and h = 0.03 for method “sEproj” (with projection) 


where A(p, q) is obtained from (1.12). If we solve this system with the symplectic 
Euler method (implicit in p , explicit in g), the qualitative behaviour of the numeri¬ 
cal solution is not correct. As was observed by Leimkuhler & Reich (1994), there 
is a linear error growth in the Hamiltonian and also a drift from the manifold A4 
(method “sE” in Fig. 1.1). The explanation for this behaviour is the fact that (1.18) 
is no longer a Hamiltonian system. If we combine the symplectic Euler applied 
to (1.18) with an orthogonal projection onto M (method “sEproj”), the result im¬ 
proves considerably but the linear error growth in the Hamiltonian is not eliminated. 
This numerical experiment illustrates that “index reduction” is not compatible with 
symplectic integration. 


VII.1.3 A Symplectic First Order Method 

We extend the symplectic Euler method to Hamiltonian systems with constraints. 
We integrate the p- variable by the implicit and the q- variable by the explicit Euler 
method. This gives 

Pn+1 = Pn h {Hq (Pn+1 5 Qn) 3“ ^{Qn) ^n+l) 

Qn +1 = Qn h Hp (p n _)_i, q n ) (1.19) 

0 = g(Qn+ 1). 

The numerical approximation (p n+ i,g n+ i) satisfies the constraint g(q ) = 0, but 
not G(q)H p (p,q) = 0. To get an approximation (p n+ i,g n+ i) G M, we append 
the projection 

Pn +1 = Pn+i ~ h G(g n +i) T // n +i (120) 

0 = G(q n +i)Hp(p n +i , g n _|_i). 

Let us discuss some basic properties of this method. 

Existence and Uniqueness of the Numerical Solution. Inserting the definition 
of q n +1 from the second line of (1.19) into 0 = g(q n + 1 ) gives a nonlinear system 
for p n +i and ft,A n +i. Due to the factor h in front of H p (j) n +i, q n ), the implicit 
function theorem cannot be directly applied to prove existence and uniqueness of 
the numerical solution. We therefore write this equation as 
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o — g{ ( ln+ 1) — g(Qn) + 



G(q n + r(q n+ i - q n ))(q n +i 


- q n ) dr. 


We now use g(q n ) = 0, insert the definition of q n +i from the second line of 
(1.19) and divide by h. Together with the first line of (1.19) this yields the system 
F(p n +i,h\ n+1 ,h) = 0 with 


F(p, v , h) 


P-Pn + hHq(p, q n ) + G{q n ) T v 

/ G{q n + ThH p (jp,q n )) H p (p,q n )dr 

-Jo > 


Since ( p n ,q n ) £ M. with M from (1.14), we have F(p n , 0,0) = 0. Furthermore, 


dF 

d(p, v) 


(Pn,0,0) 


\G(q n ) Hpp ip m Qn) 0 J 


and this matrix is invertible by (1.13). Consequently, an application of the implicit 
function theorem proves that the numerical solution (p n+ i, h\ n+ i) (and hence also 
q n +i) exists and is locally unique for sufficiently small h. 

The projection step (1.20) constitutes a nonlinear system for p n +i and ft,/i n+ 1 , 
to which the implicit function theorem can be directly applied. 

Convergence of Order 1 . The above use of the implicit function theorem yields 
the rough estimates 


Pn+ 1 =Pn+ 0(h), h\ n+ i = 0(h), Vn +1 = G(h), 


which, together with the equations (1.19) and (1.20), give 

q n +1 = q(tn+ 1 ) + 0{h 2 ), p n+ 1 = p(t n+ 1 ) - G(q(t n+ i)) T v + 0(h 2 ), 

where (p(t), q(t )) is the solution of (1.9) passing through (p n , q n ) e A4 at t = t n . 
Inserting these relations into the second equation of (1.20) we get 

0 = G(q(t))H p (p(t),q(t)) + G(q{t))H pp (p{t),q{t))G(q{t)) T v + 0{h 2 ) 

at t = t n + 1 . Since G(g(f))i7 p (p(f), q(t)) = 0, it follows from (1.13) that v = 
0(h 2 ). The local error is therefore of size 0(h 2 ). 

The convergence proof now follows standard arguments, because the method is 
a mapping ^ : M —► M on the solution manifold. We consider the solutions 
{Pn(t),q n (t)) of (1.9) passing through the numerical values (p n , q n ) G M at t = 
£ n , we estimate the difference of two successive solutions in terms of the local error 
at t n , and we sum up the propagated errors (see Fig. 3.2 of Sect. II.3 in Hairer, 
Nprsett & Wanner (1993)). This proves that the global error satisfies p n — p(t n ) = 
0(h) and q n — q(t n ) = 0(h ) as long as t n = nh < Const. 
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Symplecticity. We first study the mapping (p n , q n ) ► (Pn+i>(Zn+i) defined by 
(1.19), and we consider A n+ i as a function A (p ni q n ). Differentiation with respect 
to ( Pn,q n ) yields 

(1 + hH*, 0\ {d(p n+1 ,q n+1 )\ (l-hG T \ p S-hG T X q \ 

\ -hH pp l)\ d(p n , q n ) J \ 0 I + hH qp j i 

where S = —hH qq — h\ T g qq is a symmetric matrix, the expressions H qp , H pp , 
H qq , G are evaluated at (/?„ - and A, X p , X q at (p n , q n ). A computation, iden¬ 
tical to that of the proof of Theorem VI.3.3, yields 

/ ^(Pn+l,gn+l) \ T j / d(Pn+l,9n+lh = f 0 I ~ hX p G \ 

' d(p n ,q n ) ) V d(p n ,q n ) ) \—I+hG T X p h(G T X q — X^G)) 

We multiply this relation from the left by £1 , Qn )M and from the right by 

£2 C T (Pn,q n ) M - with the partitioning £ = (£ p , kq) we have G(Qn)€q, j = 0 for 
j = 1,2 so that the expression reduces to This proves the symplecticity 

condition (1.16) for the mapping (p n , q n ) t-f (p n + 1 ,q n + 1 ). 

Similarly, the projection step (p n+1 ,q n + 1 ) i-> (p n+ i,q n +i) of (1.20) gives 

d(p n+1 ,q n+1 ) = (I — hG T n P S — hG T n q \ 

d($n+i,qn+i) V 0 1 / 

where // n +i of ( 1 . 20 ) is considered as a function of (pn+i, (Zn+i), and S = 
—hn T g q q. This is formally the same as (1.21) with H = 0. Consequently, the 
symplecticity condition is also satisfied for this mapping. As a composition of two 
symplectic transformations, the numerical flow of our first order method is therefore 
also symplectic. 



50 


100 



Fig. 1.2. Spherical pendulum problem solved with the symplectic Euler method (1.19)- 
(1.20) and with the implicit Euler method; initial value go = (sin(1.3), 0, cos(1.3)), 
po = (3 cos(1.3), 6.5, —3 sin(1.3)), step size h = 0.01 
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Numerical Experiment. Consider the equations (1.3) for the spherical pendulum. 
For a mass m m 1 they coincide with the Hamiltonian formulation. Figure 1.2 
(upper picture) shows the numerical solution (vertical coordinate q%) over many 
periods obtained by method (1.19)-(1.20). We observe a regular qualitatively correct 
behaviour. For the implicit Euler method (i.e., the argument q n is replaced with q n +1 
in (1.19)) the numerical solution, obtained with the same step size and the same 
initial values, is less satisfactory. Already after one period the solution deteriorates 
and the pendulum loses energy. 

VII. 1.4 SHAKE and RATTLE 

The numerical method (1.19)-(1.20) is only of order 1 and it is not symmetric. An 
algorithm that is of order 2, symmetric and symplectic was originally considered for 
separable Hamiltonians 


H(p,q) = ±p T M- 1 p+U(q) (1.22) 

with constant mass matrix M. Notice that in this case we are concerned with a 
second order differential equation Mq = — U q (q ) — G(q) T X with g(q) = 0. 

SHAKE. Ryckaert, Ciccotti & Berendsen (1977) propose the method 

Qn+i 2 q n ~\~ qn —l = h M (Uq(q n ) H - G((/ n ) A n ) .. 

o = g{q n+ 1) ' } 


for computations in molecular dynamics. It is a straightforward extension of the 
Stormer-Verlet scheme (1.1.15). The p-components, not used in the recursion, are 
approximated by p n = M(g n+ i — q n _i)/2h. 

RATTLE. The three-term recursion (1.23) may lead to an accumulation of round¬ 
off errors, and a reformulation as a one-step method is desirable. Using the same 
procedure as in (1.1.17) we formally get 

Pn-\- 1/2 Pn q(pLn) T~ ^((7n) ^n) 

q n +1 = q n + hM~ l p n+1/2 , 0 = g(q n + 1 ) (1-24) 

Pn+1 = Pn+ 1/2 - ^{U q (q, 1 + 1 ) + G(q n+ i) t A„+i). 

The difficulty with this formulation is that A n+ i is not yet available at this step (it 
is computed together with g n+ 2 ). As a remedy, Andersen (1983) suggests replacing 
the last line in (1.24) with a projection step similar to (1.20) 

Pn+1 = Pn+1/2 ~ 2 (Uq(Qn+l) + G(q n +i) p n ) ^ 

0 = G(g n +i)M 1 p n +1- 
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This modification, called RATTLE, has the further advantage that the numerical ap¬ 
proximation (p n+ i,g n+ i) lies on the solution manifold M. The symplecticity of 
this algorithm has been established by Leimkuhler & Skeel (1994). 


Extension to General Hamiltonians. As observed independently by Jay (1994) 
and Reich (1993), the RATTLE algorithm can be extended to general Hamiltonians 
as follows: for consistent values (p n , q n ) e M define 


Pn+ 1/2 

Qn-\-l 

0 

Pn +1 

0 


Pn 2 (^9(^+1/2? Qn) G(q n ) A n ) 

Qn + - (Hp(p n +l/2, Q n ) + #p(Pn+l/2? #n+l)) 
9(Qn+ 1 ) 

Pn + l/2 ~ ^{H q ( P n+l/2, Qn+ 1) + G(q n +i) T fl n ) 
G(Q n -\-i)Hp{pn-\-ii q n _ |_i). 


(1.26) 


The first three equations of (1.26) are very similar to (1.19) and the last two equa¬ 
tions to (1.20). The existence of (locally) unique solutions (p n+ 1 / 2 , Q n +i, A n ) and 
(p n +i,fj, n ) can therefore be proved in the same way. Notice also that this method 
gives a numerical solution that stays exactly on the solution manifold M. 


Theorem 1.3. The numerical method (1.26) is symmetric, symplectic, and conver¬ 
gent of order two. 


Proof. Although this theorem is the special case 8 = 2 of Theorem 1.4, we outline 
its proof. We will see that the convergence result is easier to obtain for s = 2 than 
for the general case. 

If we add to (1.26) the consistency conditions g(q n ) = 0, G(q n )H p (p n , q n ) = 
0 of the initial values, the symmetry of the method follows at once by exchanging 
h <r+ —h 9 Pn+ 1 Pn, Qn+ 1 ^ Qn, and A n ^ Pn- The symplecticity can be proved 
as for (1.19)-(1.20) by computing the derivative of (p n+ i,g n+ i) with respect to 
{Pm Qn), and by verifying the condition (1.16). This does not seem to be simpler 
than the symplecticity proof of Theorem 1.4. 

The implicit function theorem applied to the two subsystems of (1.26) shows 


Pn+l/2 =Pn + 0{h), h\ = 0(h), Pn+1 =Pn+l/2 + 0{h), h/J, = 0(h), 

and, inserted into (1.26), yields 

Qn-\-l = Q(fn-\-l) T - (9(/i 2 ), Pn +1 = P(^n+l) — G(g(t n+ i)) Z/ + 0(/l 2 ). 

Convergence of order one follows therefore in the same way as for method (1.19)- 
(1.20). Since the order of a symmetric method is always even, this implies conver¬ 
gence of order two. □ 

An easy way of obtaining high order methods for constrained Hamiltonian sys¬ 
tems is by composition (Reich 1996a). Method (1.26) is an ideal candidate as basic 
integrator for compositions of the form (V.3.2). The resulting integrators are sym¬ 
metric, symplectic, of high order, and yield a numerical solution that stays on the 
manifold M. 
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VII. 1.5 The Lobatto IIIA - IIIB Pair 

Another possibility for obtaining high order symplectic integrators for constrained 
Hamiltonian systems is by the use of partitioned Runge-Kutta or discontinuous col¬ 
location methods. We consider the system (1.9) and we search for polynomials u(t) 


of degree s, w(t ) of degree 5 — 1, and v(t) of degree 5 — 2 such that 

= Qui v(tn) — Pn hb\5{t n ') (1.27) 

with the defect 

S(t) = v(t) + H q (y(t),u(t )) + G(u(t)) T w(t ) (1.28) 

and, using the abbreviation t n ^ = t n + Cih , 

— Hp t/(£ n ^)) 5 i — 1 5 ••• 5 5 (1.29) 

= -H q (v(t ni i),u(t ni i)) - G(u(t n: i)) T w(tn : i), i = 2, . . . , 5 - 1 
0 = g(u(t ni i)), i = 1 ,... , 5 . 

If these polynomials exist, the numerical solution is defined by 

Qn -\-1 — ti(tn H - /l), Pn +1 = ^(t n hb s S(t n K) 

(1.30) 

0 = f^(^n+l)-f^p(Pn+l5 ^n+l)- 

Why Discontinuous Collocation Based on Lobatto Quadrature? At a first 


glance (Theorem VI.4.2) it seems natural to consider collocation methods based on 
Gaussian quadrature for the entire system. This, however, has the disadvantage that 
the numerical solution does not satisfy g(q n +i) = 0. To achieve this requirement, 
t n + h has to be one of the collocation points, i.e., we must have c s = 1. Unfortu¬ 
nately, none of the collocation or discontinuous collocation methods with c s = 1 is 
symplectic (see Exercise IV.6). We therefore turn our attention to partitioned meth¬ 
ods, and we treat only the ^-component by a collocation method satisfying c s = 1. 
To satisfy the 5 conditions g(u(t n ^)) = 0 of (1.29) there are only 5 — 1 free pa¬ 
rameters w(t n ), w(t n + C 2 / 1 ),.. •, w(t n + c s -\h) available. A remedy is to choose 
ci = 0 so that the first condition g(u(t n )) = 0 is automatically verified. Encour¬ 
aged by Theorem VI.4.5 we are thus led to consider the Lobatto nodes in the role 
of the Ci. The use of the partitioned Lobatto IIIA - IIIB pair for the treatment of 
constrained Hamiltonian systems has been suggested by Jay (1994, 1996). 

Existence and Uniqueness of the Numerical Solution. The polynomial u(t) of 
degree 5 is uniquely determined by u(t n ) = q n and u(t n =: Qi (i = 1 ,..., 5), 
the polynomial v(t) of degree 5 — 2 is uniquely determined by v(t n j) =: Pi (i = 
1,..., 5 — 1), and the polynomial w(t) of degree 5 — 1 is uniquely determined 
by hw(t n ,i) =: (i = 1,..., 5 ). Notice that the value A s is only involved in 

(1.30) and not in (1.27)-(1.29). For the nonlinear system (1.27)-(1.29) we therefore 
consider 
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X (Ql •> * * • 5 Qsi Pi 5 • • • 5 P,S — 1 • A\ •)'••’) A-s— l) 

as independent variables, and we write the system as F(X, h) = 0. The function 
F is composed of the s conditions for u(t n p, of the definition of v(t n ) (divided 
by bi) and the s — 2 conditions for v(t n p (multiplied by h), and finally of the 
8 — 1 equations 0 = g(u(t n ^)) for i = 2,..., s (divided by ft,). Observe that 
0 = g(u(t n )) is automatically satisfied by the consistency of (p n , q n ). We note that 
P s = v(t n + ft) and Pi = hv(t n ^) are linear combinations of Pi,, P s _i with 
coefficients independent of the step size ft. 

The function F(X, ft) is well-defined for ft in a neighbourhood of 0. For the first 
two blocks this is evident, for the last one it follows from the identity 

^ff(u(tn,i))= J G(u(t n + 0h))u(t n + 6h)d0 

using the fact that u(t n + Oh) is a linear combination of Qi for i = 1,..., s. With 
the values 


^0 — (Pp (Pn 5 Qn)i • • • ? HpiPm Qn)iPm • • • iPn-) 0 , . . . , 0 ^) 

we have that P(X 0 , 0 ) = 0 , because the values (p n , q n ) are assumed to be consis¬ 
tent. In view of an application of the implicit function theorem we compute 

dF ( I® I -D®H PP 0 \ 

—-(AT 0 ,0) = ( 0 I®G T , (1.31) 

dX \A®G 0 0 ) 

where H pp , G are evaluated at (p n ,g n ), and A, P,P are matrices of dimension 
(s — 1 ) x 8 , (8 — 1 ) x (s — 1 ) and 8 x (s — 1 ) respectively that depend only on the 
Lobatto quadrature and not on the differential equation. For example, the matrix B 
represents the linear mapping 

(Pi, . . . , P s -l) H-► (Pi + b 1 1 Pl, P 2 , • • • , P S -l) • 

This mapping is invertible, because the values on the right-hand side uniquely de¬ 
termine the polynomial v(t) of degree 8 — 2 . 

Block Gaussian elimination then shows that (1.31) is invertible if and only if the 
matrix 

ADB~ X 0 GH pp G t is invertible. 

Because of (1.13) it remains to show that ADB~ X is invertible. 

To achieve this without explicitly computing the matrices A, P, D, we apply the 
method to the problem where p and q are of dimension one, P(p, q) = p 2 / 2 , and 
g(q) = q. Assuming h = 1 we get 

'u(O) = 0, u(0) = —bi (^(0) + 'tc(O)) 

ii(ci) = vfe) for i = 1 ,..,, s 
v(ci) = — w(ci) for i = 2 ,..., s — 1 

0 = u{ci) for i = 1 ,..., 8 , 


(1.32) 
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which is equivalent to 



(*(*))?=i \ 



(1.33) 


because H pp (p,q) = 1 and G(q) = 1. Since u(t) is a polynomial of degree s , 
the last equation of (1.32) implies that u(t) = Cf\ s - =1 (t — Cj) . By the second 
relation the polynomial u(t) — v(t), which is of degree 5 — 1, vanishes at 5 points. 
Hence, v(t) = u(t), which is possible only if C = 0, because the degree of v(t) is 
5 — 2. Consequently, the linear system (1.33) has only the trivial solution, so that the 
matrix in (1.33) and hence also ADB~ X is invertible. 

The implicit function theorem applied to F(X, h) = 0 shows that the nonlinear 
system (1.27)-(1.30) possesses a locally unique solution for sufficiently small step 
sizes h. Using the free parameter A s = hw(t n + h), a further application of the 
implicit function theorem, this time to the small system (1.30), proves the existence 
and local uniqueness of p n +i- 


Theorem 1.4. Let ( bi , q)| =1 be the weights and nodes of the Lobatto quadrature 
(c.fi (II.1.17)). The method (1.27)-(1.29)-(1.30) is symmetric, symplectic, and super- 
convergent of order 2s — 2. 


Proof. Symmetry. To the formulas (1.27)-(1.29)-(1.30) we add the consistency re¬ 
lations g(q n ) = 0, G(q n )H p (p n ,q n ) = 0. Then we exchange (t n ,p n ,q n ) 
(£ n+ i,p n+ i, q n +i) and h —h. Since b\ = b s and c s+ i_^ = 1 — q for the Lobatto 
quadrature, the resulting formulas are equivalent to the original method (see also the 
proof of Theorem V.2.1). 

Symplecticity. We fix £i, £2 G T( Pnjqn )M, we put x n = ( PmQn) T , and we consider 
the bilinear mapping 


q( 


dpn+l 

dXr, 


dqn+1 \ = ^ T f / dgn+l \ T / gp n +l \ _ / fyn+l \ T / dqn+1 \ \ 

dx n ) 1 \V dx n ) V dx n ) V dx n ) V dx n )) 


The symplecticity of the transformation (p n , q n ) q n +i) on the manifold 

M is then expressed by the relation 


0 ( dpn+ 1 dqn+i \ =Q fdpn dqn\ 

V dx n ’ dx n J V dx n ’ dx n )' 


(1.34) 


We now follow closely the proof of Theorem IV.2.3. We consider the polyno¬ 
mials u(t),v(t),w(t) of the method (1.27)-(1.29)-(1.30) as functions of t and 
x n = ( p n , q n ), and we compute 


f dv{t n+ r) 
V dx n 


du(t n+1 ) \ _ / du(t ra ) du(t n ) \ 
dx n J V dx n ’ dx n ) 



dQ fdv{t) du{t)\ 

~dt\ ~dx~ n ’ d^~ n J dt 


(1.35) 
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Since u(t) is a polynomial of degree s and v(t) of degree 8 — 2, the integrand in 
(1.35) is a polynomial in t of degree 2s — 3. It is thus integrated without error by the 
Lobatto quadrature. By definition these polynomials satisfy the differential equation 
at the interior collocation points. Therefore, it follows from (1.17) that 


dQ / dv(tn t i) du(tn t i)\ _ 

dt V dx n ’ dx n ) - A " 


2,..., s — 1, 


and that 


dQ fdv(t n j) du(t n ^) 


dt v dx n 


dx n 


)-«( 


9S(t nt i) du(t nt i) 


dx n 


dx r 




for i = 1 and i 


s. 


Applying the Lobatto quadrature to the integral in (1.35) thus yields 

' dS(t n ) du(t n ) 


hbi q (- 


dx n 


dx n 


V ox n ox n J 


and the symplecticity relation (1.34) follows in the same way as in the proof of 
Theorem IV.2.3. 


Superconvergence. This is the most difficult part of the proof. We remark that super¬ 
convergence of Runge-Kutta methods for differential-algebraic systems of index 3 
has been conjectured by Hairer, Lubich & Roche (1989), and a first proof has been 
obtained by Jay (1993) for collocation methods. In his thesis Jay (1994) proves su¬ 
perconvergence for a more general class of methods, including the Lobatto IIIA - 
IIIB pair, using a “rooted-tree-type” theory. A sketch of that very elaborate proof 
is published in Jay (1996). Using the idea of discontinuous collocation, the elegant 
proof for collocation methods can now be extended to cover the Lobatto IIIA - IIIB 
pair. In the following we explain how the local error can be estimated. 

We consider the polynomials u(t),v(t),w(t) defined in (1.27)-(1.29)-(1.30), 
and we define defects p(t),5(t), 6(t) as follows: 

ii{t) = H p (v(t),u(t)) + p(t) 

v{t) = —H q (y(t),u(t)) - G[u(t)) T w(t) + S(t) (1.36) 

0 = g(u(t)) +0(t). 

By definition of the method we have 

V(t n + Cih) = 0, i = l,...,s 

S(t n + Cih ) = 0, i = 2,..., s — 1 (1.37) 

0(t n + cih ) = 0, i = 1,..., 8. 


We let q(t),p(t ), A (t) be the exact solution of (1.9) satisfying q(t n ) = q n , p(t n ) = 
p n , and we consider the differences 


Au(t ) = u(t) — q(t), Av(t) = v(t) — p(t), Aw(t) = w(t ) — A (t). 
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Subtracting (1.9) from (1.36) we get by linearization that 

Au = au(t)Au + a 12 (t)Av + fi(t) 

(1.38) 

Av = a 2 i(t)Au + a 22 (t)Av+ a, 23 (t)Aw+ 5(t), 

where a\ 2 (t) = H pp (p(t),q(t )), and where the other a^-(t) are given by similar 
expressions. We have suppressed quadratic and higher order terms to keep the pre¬ 
sentation as simple as possible. They do not influence the convergence result. To 
eliminate Aw in (1.38), we differentiate the algebraic relations in (1.9) and (1.36) 
twice, and we subtract them. This yields 

0 = F(t, p(t)) + bi(t)Au + b 2 (t)Av + B(t)Aw 
+ G{u(t))H pp (v(t),u(t))S(t) + G(u(t))ii(t) + 0(f), 

where F(t, p), B(t ), &i(f), 62 (f) are functions depending on p(t), q(t ), A(f), rt(f), 
u(f), rc(f), and where F(t, 0) = 0 and B(t ) ~ G(q n )H pp (p ni q n )G(q n ) T . Because 
of our assumption (1.13) we can extract Aw from this relation, and we insert it into 
(1.38). In this way we get a linear differential equation for Au , Av, which can be 
solved by the “variation of constants” formula. Using Au(t n ) = 0 (by (1.27)), the 
solution Av(t n + h) is seen to be of the form 


rtn~\-h 

Av(t n + h ) = i? 22 (fn + h, t n )Av(t n ) + / 

+ R22(t n + h,t)(5(t) + F(t,n(t)) + d(t)ii(t) (1.39) 

+ C(t)(G(u(t))H pp (v(t),u(t))S(t) + 6 >(t}))) dt, 

where R 21 and R 22 are the lower blocks of the resolvent, and F, ci, C are functions 
as before. To prove that the local error of the p-component 


(-R 21 (t n + h,t)n(t) 


Pn+ 1 - p(t n + h) = Av(t n + h) - hb s S(t n + h) (1-40) 


is of size 0{h 2s x ), we first integrate by parts those expressions in (1.39) which 
contain a derivative. For example, 


rtn+1 

J a(t)fi(t) 


dt = a(t)fi(t) 


tn -\-1 


/ t n + l 


dt = Oih 28 - 1 ), 


because jjb(t n ) = p(t n -\-h) = 0 by (1.37) and an application of the Lobatto quadra¬ 
ture to the integral at the right-hand side gives zero as result with a quadrature error 
of size 0(6, 2s_1 ). Similarly, integrating by parts twice yields 


tn+l 


a(t)0(t)dt = a(t)0(t)\ — d(t)0(t) 


tn -\-1 


d(t)6(t) dt 


= a(Wl)^Wl) - a(t n )0(t n ) + 0(h 2s 1 ). 
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To the other integrals in (1.39) we apply the Lobatto quadrature directly. Since 
^ 22 (^ 71 + 1 ^ 71 + 1 ) is the identity, this gives 

Pn+l P(t"n+ 1 ) = ^22 (^ 77 + 15 tn ) ^A+(t n ) hbiS(t n ^j (1*41) 

C(t n - |_i) ^ hb s G(u(t n -\-i))Hpp {v(t n - (-i), it-(t n _|_i))(^(t n _|_i) + $(£ 77 + 1 )^ 

+ C(t„)(/i6iG(u(l n ))flpp(v(i„),«(i„))5(i n )-(?(*„)) + 0(h 2s ~ 1 ), 

where C(t ) = i?(t n+ i, t)C(t). The term Av(t n ) + hb\5(t n ) vanishes by (1.27), 
and differentiation of the algebraic relation in (1.36) yields 

0 = G(u(t))(Hp(v(t),u(t)) +A+)) +9(t). 

As a consequence of (1.27), (1.37) and the consistency of the initial values (PmQn), 
this gives 

= G(qn)H p ( Pn hb\8 (£ n ), q n ^j 
= hbiG(q n )H pp (p n , q n )5(t n ) + 0(h 2 S(t n ) 2 ) 

= hbiG(u(t n ))H pp (y (t n ),u(t n )) S(t n ) + 0{h 2 5(t n ) 2 ). 

Using (1.30) we get in the same way 

9{t n + 1 ) = -hb s G(u(t n+1 ))H pp (v(t n+1 ),u(t n+1 ))5(t n+1 ) + 0(h 2 5(t n+1 ) 2 ). 

These estimates together show that the local error (1.41) is of size 0(/i 2s_1 ) + 
0(h 2 5(t ) 2 ). The defect S(t) vanishes at s — 2 points in the interval [t n ,t n -\- i], so 
that S(t) = 0(h s ~ 2 ) for t G [t n ,t n + 1 ] (for a rigorous proof of this statement one 
has to apply the techniques of the proof of Theorem II. 1.5). Therefore we obtain 
Pn+i — p(t n+ i) = 0(h 2s ~ 2 ), and by the symmetry of the method also 0(/i 2s_1 ). 

In analogy to (1.39), the variation of constants formula yields also an ex¬ 
pression for the local error q n +1 — q(t n + 1 ) = Au(t n + 1 ). One only has to re¬ 
place R 21 and R 22 with the upper blocks Ru and R 12 of the resolvent. Using 
^ 12 (^ 77 + 1 ,^ 77 + 1 ) = 0, we prove in the same way that the local error of the q- 
component is of size 0 (/i 2s_1 ). 

The estimation of the global error is obtained in the same way as for the first 
order method (1.19)-(1.20). Since the algorithm is a mapping : M —► M on the 
solution manifold, it is not necessary to follow the technically difficult proofs in the 
context of differential-algebraic equations. Summing up the propagated local errors 
proves that the global error satisfies p n — p(t n ) = 0(h 2s ~ 2 ) and q n — q(t n ) = 
0(h 2s ~ 2 ) as long as t n —nh< Const. □ 

VII.1.6 Splitting Methods 

When considering splitting methods for constrained mechanical systems, it should 
be borne in mind that such systems are differential equations on manifolds (see 
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Sect. VII. 1.2). Splitting methods should therefore be based on a decomposition 
f(y) = f [1] (y) + f [2] (y ), where both p\y) are vector fields on the same man¬ 
ifold as f(y). Let us consider here the Hamiltonian system (1.9) with Hamiltonian 

H(p,q) = H^(p,q) + H®(p,q). (1.42) 

The manifold for this differential equation is 

M = { (p, q) | g(q) = 0, G(q)H p (p, q) = 0}. (1.43) 

Notice that (1.9), when H is simply replaced with H^\ is not a good candidate for 
splitting methods: the existence of a solution is not guaranteed, and if the solution 
exists it need not stay on the manifold Ai. The following lemma indicates how 
splitting methods should be applied. 

Lemma 1.5. Consider a Hamiltonian (1.42), a function g(q ) with G(q) = g'(q), 
and let the manifold Ai he given by (1.43). If (1.13) holds and if 

G(q)Hf (p, q)= 0 for all (p, q) € M. (1.44) 


then the system 

q = H [ p\p,q) 

p = -H%\p,q) - G(q) T X (1-45) 

0 = G(q)H p (p,q) 

defines a differential equation on the manifold At, and its flow is a symplectic trans¬ 
formation on Ai. 

Proof. Differentiation of the algebraic relation in (1.45) with respect to time, and 
replacing q and p with their differential equations, yields an explicit relation for 
A = A(p, q) (as a consequence of (1.13)). Hence, a unique solution of (1.45) exists 
locally if G(qo)H p (po, q Q ) =0. The assumption (1.44) implies jjyg(q(t)) =0. This 
together with the algebraic relation of (1.45) guarantees that for (po, Qo) £ Ai the 
solution stays on the manifold Ai. The symplecticity of the flow is proved as for 
Theorem 1.2. □ 

Suppose now that the Hamiltonian H(p,q) of (1.9) can be split as in (1.42), 
where both (p, q) satisfy (1.44). We denote by the flow of the system (1.45). 
If these flows can be computed analytically, the Lie-Trotter splitting o p^ and 
the Strang splitting pj^ 2 o pj^ o p£j 2 yield first and second order numerical inte¬ 
grators, respectively. Considering more general compositions as in (II.5.6) and using 
the coefficients proposed in Sect. V.3, methods of high order are obtained. They give 
numerical approximations lying on the manifold Ai, and they are symplectic (also 
symmetric if the splitting is well chosen). 
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For the important special case where 

H(p,q) = T(p,q) + U(q ) 

is the sum of the kinetic and potential energies, both summands satisfy assumption 
(1.44). This gives a natural splitting that is often used in practice. 

Example 1.6 (Spherical Pendulum). We normalize all constants to 1 (cf. Exam¬ 
ple 1.1) and we consider the problem (1.9) with 

H(p,q) = 1 (pi +pl +pf) +q 3 , g{q) = \{q\ + q\ + qt - l). 

We split the Hamiltonian as (p, q) = \ (jp\ + p\ + p§) and (p, q) = g 3 , 
and we solve (1.45) with initial values on the manifold 

M = {(p,g) | q\ + + - l = 0, v\Q\ + P2Q2 +^ 3^3 = 0}. 

The kinetic energy (p, q) leads to the system 

Q=P, P= -Q\ Q T P = 0, 

which gives A = PqP 0 , so that the flow is just a planar rotation around the 
origin. The potential energy (p, q) leads to 

^ = 0, p= -(0,0, 1 ) T q T p = 0. 

The flow (pf ^ keeps q(t) constant and changes p(t) linearly with time. Splitting 
methods give simple, explicit and symplectic time integrators for this problem. 


VII.2 Poisson Systems 

This section is devoted to an interesting generalization of Hamiltonian systems, 
where J -1 in (VI.2.5) is replaced with a nonconstant matrix B(y). Such struc¬ 
tures were introduced by Sophus Lie (1888) and are today called Poisson systems. 
They result, in particular, from Hamiltonian systems on manifolds written in non- 
canonical coordinates. In a first subsection, however, we discuss the Poisson struc¬ 
ture of Hamiltonian systems in canonical form. 


VII.2.1 Canonical Poisson Structure 

... quelques remarques sur la plus profonde decouverte de M. Poisson, 
mais qui, je crois, n’a pas ete bien comprise ni par Lagrange, ni par les 
nombreux geometres qui l’ont citee, ni par son auteur lui-meme. 

(C.G.J. Jacobi 1840, p. 350) 
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The derivative of a function F(p, q) along the flow of a Hamiltonian system 




■ dH l i 

9= 


is given by (Lie derivative, see (III.5.3)) 
d 


_ S^fdF . dF.\_J^/dFdH dF dH\ 


i= 1 


dqi 


f^\dqidpi dpi dqi) 


( 2 . 1 ) 


( 2 . 2 ) 


This remarkably symmetric structure motivates the following definition. 


Definition 2.1. The (canonical) Poisson bracket of two smooth functions F(p, q ) 
and G(p, q) is the function 


< FC \ = 

2=1 


(2.3) 


or in vector notation {F, G}(y) = \7F(y) T J 1 VG(y), where y = (p, g) and J is 
the matrix of (VI.2.3). 


This Poisson bracket is bilinear, skew-symmetric ({F, G} = — {G, F}), it satis¬ 
fies the Jacobi identity (Jacobi 1862, Werke 5, p. 46) 

{{F, G}, #} + {{G, H}, F} + {{H, F}, G} = 0 (2.4) 

(notice the cyclic permutations among F, G, H ), and Leibniz’ rule 


{F • G, F} = F • {G, F} + G • {F, F}. (2.5) 


These formulas are obtained in a straightforward manner from standard rules of 
calculus (see also Exercise 1). 

With this notation, the Lie derivative (2.2) becomes 

j t F(y(t)) = {F,H}(y(t)). (2.6) 

It follows that a function I(p, q) is a first integral of (2.1) if and only if 

{I,H} = 0. 

If we take F(y) = y t , the mapping that selects the v'th component of y, we see that 
the Hamiltonian system (2.1) or (VI.2.5), y = J~ 1 VH{y), can be written as 


yi = {yi,H}, i = 1,... ,2d. 


(2.7) 
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Poisson’s Discovery. At the beginning of 
the 19th century, the hope of being able to 
integrate a given system of differential equa¬ 
tions by analytic formulas faded more and 
more, and the energy of researchers went to 
the construction of, at least, first integrals. In 
this enthusiasm, Jacobi declared the subse¬ 
quent result to be “Poisson’s deepest discov¬ 
ery” (see citation) and his own identity, de¬ 
veloped for its proof, a “gravissimum Theo- 
rema”. 

Theorem 2.2 (Poisson 1809). If I\ and J 2 

are first integrals, then their Poisson bracket 
{/i, 1 2 } is again a first integral. 

Proof. This follows at once from the Jacobi 
identity with F = I\ and G = / 2 . □ 



VII.2.2 General Poisson Structures 


... the general concept of a Poisson manifold should be credited to So- 
phus Lie in his treatise on transformation groups ... 

(J.E. Marsden & T.S. Ratiu 1999) 


We now come to the announced generalization of Definition 2.1 of the canoni¬ 
cal Poisson bracket, invented by Lie (1888). Indeed, many proofs of properties 
of Hamiltonian systems rely uniquely on the bilinearity, the skew-symmetry and 
the Jacobi identity of the Poisson bracket, but not on the special structure of 
(2.3). So the idea is, more generally, to start with a smooth matrix-valued function 
B(y ) = (bij{y)) and to set 




y- dm 


b ij(y) 


9G{y) 


( 2 . 8 ) 


(or more compactly {F, G}(y) = \7F(y) T B(y)\7G(y)). 

Lemma 2.3. The bracket defined in (2.8) is bilinear, skew-symmetric and satisfies 
Leibniz' rule (2.5) as well as the Jacobi identity (2.4) if and only if 

b ij(y) =-bji(y) for all i,j (2.9) 

and for all i,j,k( notice the cyclic permutations among i,j,k) 


y ^( dbjj{y) 


hk{y) + 


dbjkjy) 

dyi 


bu(y) + 


dbkiiy) 

dyi 



= 0. 


( 2 . 10 ) 


1 Simeon Denis Poisson, born: 21 June 1781 in Pithiviers (France), died: 25 April 1840 in 
Sceaux (near Paris). 
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Proof. The main observation is that condition (2.10) is the Jacobi identity for the 
special choice of functions F = y i: G = i/j, H = yk because of 

{Vii Vj} = bij(y). (2.11) 

If equation (2.4) is developed for the bracket (2.8), one obtains terms containing 
second order partial derivatives - these cancel due to the symmetry of the Jacobi 
identity - and terms containing first order partial derivatives; for the latter we may 
assume F, G, H to be linear combinations of yj , yk , so we are back to (2.10). 
The details of this proof are left as an exercise (see Exercise 1). □ 

Definition 2.4. If the matrix B(y) satisfies the properties of Lemma 2.3, formula 
(2.8) is said to represent a (general) Poisson bracket. The corresponding differential 
system 

y = B(y)\7H(y ), (2.12) 

is a Poisson system. We continue to call H a Hamiltonian. 

The system (2.12) can again be written in the bracket formulation (2.7). The 
formula (2.6) for the Lie derivative remains also valid, as is seen immediately from 
the chain rule and the definition of the Poisson bracket. Choosing F = H, this 
shows in particular that the Hamiltonian H is a first integral for general Poisson 
systems. 

Definition 2.5. A function C(y) is called a Casimir function of the Poisson system 

(2.12) , if 

\7C(y) T B(y) = 0 for all y. 

A Casimir function is a first integral of every Poisson system with structure 
matrix B(y), whatever the Hamiltonian H(y) is. 

Example 2.6. The Lotka-Volterra equations of Sect. 1.1.1 can be written as 

($) = (-“„ o’) (213 » 

where H(u,v) = u — Inu + v — 2lnv is the invariant (1.1.4). This is of the form 

(2.12) with a matrix that is skew-symmetric and satisfies the identity (2.10). 

Higher dimensional Lotka-Volterra systems can also have a Poisson structure 

(see, e.g., Perelomov (1995) and Suris (1999)). Lor example, the system 

2/1 = yi (2/2 + 2/3), 2/2 = 2/2(2/! - 2/3 +1), m = 2/3(221 + 2/2 +1) 


yi\ 1 

{ 0 

2 / 12/2 

V2 = | 

— 2 / 12/2 

0 

2/3 / 

\ — 2 / 12/3 

V2V3 


can be written as 


y m 
—2/22/3 

0 


VH(y) 


(2.14) 
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with H{y) = —yi + y 2 + V 3 + In 7/2 — In 7/3. Again one can check by direct com¬ 
putation that (2.10) is satisfied. 

In contrast to the structure matrix J -1 of Hamiltonian systems in canonical 
form, the matrix B(y) of (2.12) need not be invertible. All odd-dimensional skew- 
symmetric matrices are singular, and so is the matrix B(y) of (2.14). In this case, 
the vector v{y) = (— 1/yi, —l/y 2 , l/ps) T satisfies v{y) T B{y) = 0. Since v(y) = 
VC(y) with C(y) = — In y\ — In y 2 -fin 7/3, the function C(y) is a Casimir function. 

VII.2.3 Hamiltonian Systems on Symplectic Submanifolds 

An important motivation for studying Poisson systems is given by Hamiltonian 
problems expressed in non-canonical coordinates. 

Example 2.7 (Constrained Mechanical Systems). Consider the system (1.9) 
written as the differential equation 

m 

X = J - 1 (vH(x) + ]T Xi(x) V fli (x)) (2.15) 

i= 1 

on the manifold A4 = {x ; c(x) = 0} with c(x) = (g(q), G(q)Hp(p,q)) T and 
x = (p, q) T (see (1.14)). As in the proof of Theorem 1.2, \{x) and gi(x) are the 
components of X(x) and g(x), and X(x) is the function obtained from (1.12). We 
use y G as local coordinates of the manifold M via the transformation 

x = x(y)- 

In these coordinates, the differential equation (2.15) becomes, with X(y) = x'(y), 

rn 

x(y)y = ^■ 1 (v J ff(x(y))+^A i (x(y))v 5 i ( x (y))). 


We multiply this equation from the left with X(y) T J and note that the columns of 
X(y), which are tangent vectors, are orthogonal to the gradients Xgi of the con¬ 
straints. This yields 


X(y) T JX(y)y = X(y) T XH( X (y))- 

By assumption (1.13) the matrix X(y) T JX(y) is invertible. This is seen as follows: 
X(y) T JX(y)v = 0 implies JX(y)v = c'(x) T w for some w (x = x(y))- By 
c {x{y)) = 0 an d c'(x)X(y) = 0 we get c'(x)J~ 1 c'(x) T w = 0. It then follows 
from the structure of d {x) and from (1.13) that w = 0 and hence also v — 0. 

With B(y) = {X{y) T JX{y)) X and K{y) = H{x(y)) , the above equation 
for y thus becomes the Poisson system y = B{y)XK{y). The matrix B{y) is skew- 
symmetric and satisfies (2.10), see Theorem 2.8 below or Exercise 11. 
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More generally, consider a symplectic submanifold Ad of R 2d , that is, a manifold 
for which the symplectic two-form 2 

< 4 c(£i>£ 2 ) = (J£ 1 ,^ 2 ) for £i ,£2 G T X M (2.16) 

(with (•, •) denoting the Euclidean inner product on R 2d ) is non-degenerate for every 
x G Ad : for £1 in the tangent space T X M , 

^(£ 1 , 6 ) = 0 for all e T X M implies £1 = 0 . 

In local coordinates x = %(?/), this condition is equivalent to the invertibility of 
the matrix X(y) T JX(y ) with X(y) = x'{y)> since every tangent vector at x = 
x(y) is of the form £ = X(y)rj and X(y) has linearly independent columns. A 
manifold defined by constraints, Ad = {x G R 2d \ c(x) = 0}, is symplectic if the 
matrix c'(x)J~ 1 c'(x) T is invertible for every x e M (see the argument at the end 
of the previous example). This condition can be restated as saying that the matrix 
({ q , Cj}(x)) of canonical Poisson brackets of the constraint functions is invertible. 

We consider the reduction of the Hamiltonian system to the symplectic subman¬ 
ifold Ad, which determines solution curves t » x(t) G Ad by the equations 

(Jar - V77(x),£) = 0 for all £ G T X M. (2.17) 

With the interpretation (Vi7(a;),£) = H'(x)£ = ^ \ t=Q H(j(t)) as a directional 
derivative along a path 7 (£) G Ad with 7 ( 0 ) = x and 7 ( 0 ) = £, it is sufficient 
that the Hamiltonian 77 is defined and differentiable on the manifold Ad. Equation 
(2.17) can also be expressed as 

ce x (x,^) = H'(x)£ for all £ G T x Ad, (2.18) 

a formulation that is susceptible to further generalization; cf. Marsden & Ratiu 
(1999), Chap. 5.4, and Exercise 2. Choosing £ = x we obtain 0 = H'(x)x = 
-^H(x(t)), and hence the Hamiltonian is conserved along solutions. 

Note that for Ad of Example 2.7, the formulation (2.17) is equivalent to the 
equations of motion (2.15) of the constrained mechanical system. It corresponds to 
d’Alembert's principle of virtual variations in constrained mechanics; see Arnold 
(1989), p. 92. In quantum mechanics the Hamiltonian reduction (2.17) to a mani¬ 
fold (in that case, a submanifold of the Hilbert space L 2 (R N , M 2 ) instead of R 2d ) 
is known as the Dirac-Frenkel time-dependent variational principle and is the ba¬ 
sic tool for deriving reduced models of the many-body Schrodinger equation; see 
Sect. VII .6 for an example. From a numerical analysis viewpoint, (2.17) can also be 
viewed as a Galerkin method on the solution-dependent tangent space T X M. 

In terms of the symplectic projection P(x) : R 2d —> T X M for x G Ad, defined 
by determining P{x)v G T X M for v G R 2d from the condition 

(. JP{x)v , £) = (Jv, 0 for all £ G T X M, (2.19) 

2 Notice that this two-form is the negative of that introduced in Sect. VI.2. This slight in¬ 
consistency makes the subsequent formulas nicer. 
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formula (2.17) can be reformulated as the differential equation on M, 

X = P{x)J~ 1 VH(x). (2.20) 

In coordinates x = x{y)> and again with X(y) = x'(y), formula (2.17) becomes 

X{y) T (jX{y)y-XH{x{y)))= 0, 

and with 

B(y)=(X(y) T JX(y))- 1 and K(y) = H( X (y)), (2.21) 

we obtain the differential equation 

V = B(y)VK(y). (2.22) 

Theorem 2.8. For a Hamiltonian system (2.17) on a symplectic submanifold A4, 
the equivalent differential equation in local coordinates, (2.22) with (2.21), is a 
Poisson system. 

Proof. In coordinates, the symplectic projection is given by 

P(x) = X(y)B(y)X(y) T J for x = X (y) e M, 

since for every tangent vector £ = X(y)rj we have by (2.21), 

(JXBX t Jv,Xrj) = (X T JXBX t Jv,tj) = (X T Jv,rj) = (Jv,Xrj). 

From the decomposition R 2d = P(x)R 2d (&(I—P(x))R 2d we obtain, by the implicit 
function theorem, a corresponding splitting in a neighbourhood of the manifold M 

in R 2d , 

v = x + w with x G A4, P(x)w = 0. 

This permits us to extend smooth functions F(y) to a neighbourhood of M by 
setting 

F(v) = F(y) for v = x + w with x = xin), P{x)w = 0. 

We then have for the derivative F'(x) = F'(x)P(x) for x G M and hence for its 
transpose, the gradient, VF(x) = P(x) T VF(x). Moreover, by the chain rule we 
have VF(y) = X(y) T VF(x) for x = x(y)- F° r the canonical bracket this gives, 

at a? = x(2/)» 

(F,G} can (x) = \/F(x) t P(x)J- 1 P(x) t XG(x) 

= XF(y) T B(y)XG(y ) = {F,G}(y), 


and hence the required properties of the bracket defined by B(y) follow from the 
corresponding properties of the canonical bracket. □ 
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VII.3 The Darboux-Lie Theorem 

Theorem 2.8 also shows that a Hamiltonian system without constraints becomes a 
Poisson system in non-canonical coordinates. Interestingly, a converse also holds: 
every Poisson system can locally be written in canonical Hamiltonian form after 
a suitable change of coordinates. This result is a special case of the Darboux-Lie 
Theorem. Its proof was the result of several important papers: Jacobi’s theory of 
simultaneous linear partial differential equations (Jacobi 1862), the works by Cleb- 
sch (1866) and Darboux (1882) on Pfaffian systems, and, finally, the paper of Lie 
(1888). We shall now retrace this development. Our first tool is a result on the com¬ 
mutativity of Poisson flows. 

VII.3.1 Commutativity of Poisson Flows and Lie Brackets 

The elegant formula (2.6) for the Lie derivative is valid for general Poisson systems 
with the vector field f(y ) = B(y)'VH(y) of (2.12). Acting on a function F : M n —> 
M, the Lie operator (III.5.2) becomes 

DF = VF t / = \7F T B(y)\7H = {F, Hj (3.1) 

and is again the Poisson bracket. This observation is the key for the following 
lemma, which shows an interesting connection between the Lie bracket and the 
Poisson bracket. 

Lemma 3.1. Let two smooth Hamiltonians (y) and H^ (y) be given. 

If D i is the Lie operator of B(y)\7H M 

and D 2 is the Lie operator of B(y) ViT^, (3.2) 

then [Di,D 2 ] is the Lie operator of B(y)V{H^ 2 \ H^} 

(notice, once again, that the indices 1 and 2 have been reversed). 

Proof. After some clever permutations, the Jacobi identity (2.4) can be written as 
{{F,hM},H^} - {{F,hW},hW} = {F,{hW,hW}}. (3.3) 

By (3.1) this is nothing other than D\D 2 F — D 2 D\F = [D\, D 2 ]F. □ 

Lemma 3.2. Consider two smooth Hamiltonians H^ (y) and H^ (y) on an open 
connected set U, with D\ and D 2 the corresponding Lie operators and (y) and 
(y) the corresponding flows. Then, if the matrix B{y) is invertible, the following 
are equivalent in U: 

(i) {HM,HM} = Const ; 

(ii) [D x ,D 2 \ ^ 0; 

/•••) [ 2 ] [ 1 ] [ 1 ] [ 2 ] 

(III) p\ O (p l 3 * = O p\ • 

The conclusions “(i) =4> (ii) (Hi)” also hold for a non-invertible B(y). 



262 VII. Non-Canonical Hamiltonian Systems 


Proof. This is obtained by combining Lemma III. 5.4 and Lemma 3 . 1 . We need 
the invertibility of B(y) to conclude that {iifW, H^} = Const follows from 
B(y)V{H^,H^} = 0. □ 


VII.3.2 Simultaneous Linear Partial Differential Equations 


If two functions F(y) and G(y) are given, formula ( 2 . 8 ) determines a function 
h{y) = {F, G}(y) by differentiation. We now ask the inverse question: Given func¬ 
tions G(y) and h(y), can we find a function F(y) such that \F, G}(y) = h(y) ? 
This problem represents a first order linear partial differential equation for F. So we 
are led to the following problem, which we first discuss in two dimensions. 

One Equation. Given functions a(yi,y2), Hy-\ • y-i)- h(y -\, 2/2), find all solutions 
F(y!,y 2 ) satisfying 


a(yi,V2) 


dF 

dyi 


+ b(yi,y2) 


dF 

dy2 


h(yi,y2)- 


( 3 . 4 ) 


This equation is, for any point (t/i, 1/2), a linear relation between the partial deriv- 
atives of F, but does not determine them individually. There is one direction, 
however, where the derivative is uniquely determined, namely that of the vector 
n = (a(?/i, 7/2), , 2/2)), since the left-hand side of equation ( 3 . 4 ) is the direc¬ 
tional derivative The lines, which everywhere respect this direction, are called 

characteristic lines (see left picture of Fig. 3 . 1 ). If we parametrize them with a para¬ 
meter t , we can compute 1/2(t) as well as F(t) = F{yi(t),y2(t)) as solutions 
of the following ordinary differential equations 


y 1 = a(y 1 ,y 2 ), y2 = b(y 1 ,y 2 ), F = h(y 1 ,y 2 ). ( 3 . 5 ) 

The initial values (2/1(0), 2/2(0)) can be chosen on an arbitrary curve 7 (which must 
be transversal to the characteristic lines) and the values F | 7 can be arbitrarily pre¬ 
scribed. The solution F(yi, 2/2) of ( 3 . 4 ) is then created by the curves ( 3 . 5 ) wherever 
the characteristic lines go (right picture of Fig. 3 . 1 ). 



Fig. 3.1. Characteristic lines and solution of a first order linear partial differential equation 
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Fig. 3.2. Characteristic surfaces of two first order linear partial differential equations 


For one equation in n dimensions, the initial values ( 2/1 (0),..., t/ n (0)) can be 
freely chosen on a manifold of dimension n — 1 (e.g., the subspace orthogonal to the 
characteristic line passing through a given point), and F can be arbitrarily prescribed 
on this manifold. This guarantees the existence of n — 1 independent solutions in 
the neighbourhood of a given point. Here, independent means that the gradients of 
these functions are linearly independent. 

Two Simultaneous Equations. Two simultaneous equations of dimension two are 
trivial. We therefore suppose y = ( 2 / 1 , 3 / 2 ? 2 / 3 ) and two equations of the form 






/ X dF 

y ~dyl + 

dF 




OF . [!] dF 

(y) 77 — = Mv), 




(■ y) JT- + a [ 2 ] (y) + af(y) = h 2 {y) 


dF 


m. 


dy 3 

dF 


dyi 


dy 2 


dy 3 


(3.6) 


for an unknown function F(yi, y 2 , 2 / 3 )- This system can also be written as DiF = 
h\, D 2 F = /i 2 , where Di denotes the Lie operator corresponding to the vector field 
o\ l \y). Here, we have two directional derivatives prescribed, namely and 
where rti = aft ( y ) (see Fig. 3.2). Therefore, we will have to follow both directions 
and, instead of (3.5), we will have two sets of ordinary differential equations 


m = 2/2 = a [ 2 ] (y), y 3 = 4 1] (y)> 

yi = af ] (: y ), y 2 = a !> 21 (y ), y 3 = ajf 1 (y ), 


F = h 1 (y) 
F = h 2 (y). 


(3.7) 


If we prescribe F on a curve that is orthogonal to ni and ri 2 , and if we follow the 
solutions of (3.7), we obtain the function F on two 2-dimensional surfaces Si and 
S 2 containing the prescribed curve. Continuing from Si along the second flow and 
from S 2 along the first flow, we may be led to the same point, but nothing guarantees 
that the obtained values for F are identical. To get a well-defined F, additional 
assumptions on the differential operators and on the inhomogeneities have to be 
made. 

The following theorem, which is due to Jacobi (1862), has been extended 
by Clebsch (1866), who created the theory of complete systems (“vollstandige 
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Systeme”). These papers contained long analytic calculations with myriades of for¬ 
mulas. The wonderful geometric insight is mainly due to Sophus Lie. 

Theorem 3.3. Let D \,..., be m (m < n ) linear differential operators in W 1 
corresponding to vector fields a^(y ),..., a^ m \y) and suppose that these vectors 
are linearly independent for y = yo. If 

[Di,Dj]= 0 foralli,j, (3.8) 

then the homogeneous system 

DiF = 0 for i m 1 ,..., m 

possesses (in a neighbourhood of yo) n — m solutions for which the gradients 
VF(i/o) are linearly independent. 

Furthermore, the inhomogeneous system of partial differential equations 
DiF = hi for i «== 1 ,..., m 

possesses a particular solution in a neighbourhood of yo, if and only if in addition 
to (3.8) the functions h\(y ),..., h rn (y) satisfy the integrability conditions 

Dihj = Djhi foralli,j. (3.9) 

Proof, (a) Let V denote the space of vectors in M n that are orthogonal to a ^ (yo), 
..., a(yo), and consider the (n — m )-dimensional manifold M = yo + V. We 
then extend an arbitrary smooth function F : M —► M to a neighbourhood of yo by 

F ift^ ° • • • ° <Pt~i(yo + v i) = F (vo + v )- (3.io) 

Notice that (t i,..., t m , v) ^ y = o... o (y Q + t;) defines a local diffeomor- 
phism between neighbourhoods of 0 and yo. Since the application of the operator 
D m to (3.10) corresponds to a differentiation with respect to t m and the expression 
F(p^ o ... o (y Q + u)) is independent of t m by (3.10), we get D m F(y ) = 0. 
To prove DiF(y) = 0 for i < m, we first have to change the order of the flows 

\i\ 

in (3.10), which is permitted by Lemma III.5.4 and assumption (3.8), so that py. is 
in the left-most position. 

(b) The necessity of (3.9) follows immediately from Dihj = DiDjF = 
DjDiF = Djhi. For given hi satisfying (3.9) we define F(y) in a neighbourhood 
of yo (i.e., for small ti ,..., t m and small v) by 
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and we prove that it is a solution of the system DiF = hi for i = 1,..., m. Since 
only the last integral depends on £ m , we immediately get by differentiation with 
respect to t m that D m F = h m . For the computation of DiF we differentiate with 
respect to t^. The first i — 1 integrals are independent of t^. The derivative of the 
zth integral gives hi o ... o (y 0 + v)), and the derivative of the remaining 
integrals gives 

Dihj (y [ ? ] O ... o ^ (y 0 + «)) dt = J o D i h i (v>t ] 0 • • • ° < 4 ? (yo + «)) dt 
= hi 0...0 (y 0 + f')) - hi ° • • • ° (yo + f)) 

for j = i + 1,..., m. Summing up, this proves DiF = hi. □ 


VII.3.3 Coordinate Changes and the Darboux-Lie Theorem 


The emphasis here is to simplify a given 
Poisson structure as much as possible by a 
coordinate transformation. We change from 
coordinates yi,...,y n to y x (y),..., y n (y ) 
with continuously differentiable functions 
and an invertible Jacobian A(y) = dy/dy, 


F 



Fig. 3.3. New coordinates in a Poisson system 



and we denote F(y) := F(y) and G(y) := 

G(y) (see Fig. 3.3). The Poisson structure as well as the Poisson flow on one space 
will become another Poisson structure and flow on the other space by simply apply¬ 
ing the chain rule: 


V 9F(y) 

s chc 


My) 


dG(y) 

dyj 


E 


dF(y) dyk 

dyk dyi 


b ij{v(y )) 


dm ggg) 
dyj dyt 


(3.11) 


This is another Poisson structure with 


hi = {Vk,yi} or B(y) = A(y)B(y)A(y) T . (3.12) 

3 Jean Gaston Darboux, born: 14 August 1842 in Nimes (France), died: 23 February 1917 
in Paris. 
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The same structure matrix is obtained if the Poisson system (2.12) is written in these 
new coordinates (Exercise 5). 

Since A is invertible, the structure matrices Bjmd B have the same rank. We 
now want to obtain the simplest possible form for B. 

Theorem 3.4 (Darboux 1882, Lie 1888). Suppose that the matrix B(y) defines 
a Poisson bracket and is of constant rank n — q = 2 m in a neighbourhood of 
yo G M n . Then, there exist functions Pi(y ),. .., P m (y)> Qi(y), •. . , Qm(y)> and 
Ci(y),...,C q (y) satisfying 

{P h Pj} = 0 {P u Qj} = -Sij {P h Ci} = 0 

{Q i ,P j } = S ij {Qi,Qj} = 0 {Qi,C t } = 0 (3.13) 

{Ck,Pj} = 0 {C k ,Qj} = o {Ck,Ci} = 0 

on a neighbourhood of yo. The gradients of Pi,Qi,Ck are linearly independent, 
so that y i—► {Pi{y)^Qi{y)^Ck{y)') constitutes a local change of coordinates to 
canonical form. 

The functions Ci(y ),..., C q (y) are called distinguished functions (ausgezeich- 
nete Funktionen) by Lie. 

Proof We follow Lie’s original proof. Similar ideas, and the same notation, are 
also present in Darboux’s paper. The proof proceeds in several steps, satisfying the 
conditions of (3.13), from one line to the next, by solving systems of linear partial 
differential equations. 

(a) If all bij(yo ) = 0, the constant rank assumption implies bij(y) = 0 in a 
neighbourhood of yo. We thus have m = 0 and all coordinates Cfiy) = y- L are 
Casimirs. 

(b) If there exist i, j with bij(yf) 0 , we set Qi(y) = y- L and we determine 
Pi (y) as the solution of the linear partial differential equation 


{Qi,Pi} = 1- (3.14) 

Because of bij(yo) 0 the assumption of Theorem 3.3 is satisfied and this yields 
the existence of Pi. We next consider the homogeneous system 

{Qi,F} = 0 and {Pi, F} = 0 (3.15) 

of partial differential equations. By Lemma 3.2 and (3.14) the Lie operators cor¬ 
responding to Q i and Pi commute, so that by Theorem 3.3 the system (3.15) has 
n — 2 independent solutions P 3 ,..., F n . Their gradients together with those of Qi 
and Pi form a basis of M n . We therefore can change coordinates from yi ,..., y n to 
Qi, Pi, P 3 ,..., F n (mapping yo loyo). In these coordinates the first two rows and 
the first two columns of the structure matrix B(y) have the required form. 

(c) If bij(yo ) = 0 for all i, j > 3, we have m = 1 (similar to step (a)) and the 
coordinates P 3 ,..., F n are Casimirs. 
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(d) If there exist i > 3 and j > 3 with bij(yo ) 0, we set Q 2 = Pi and we 

determine P 2 from the inhomogeneous system 


{Qi, P 2 } = 0 , {Pi, P 2 } = 0 , {Q 2 , P 2 } = 1 . 

The inhomogeneities satisfy (3.9), and the Lie operators corresponding to Q 1 , Pi, 
Q 2 commute (by Lemma 3.2). Theorem 3.3 proves the existence of such a P 2 . We 
then consider the homogeneous system 

{Qi,F} = 0, {Pi, F} = 0, {Q 2 , P} = 0, {P 2 , P} = 0 

and apply once more Theorem 3.3. We get n — 4 independent solutions, which 
we denote again P 5 ,..., F n . As in part (b) of the proof we get new coordinates 
Qi, Pi, Q 2 , P 2 , P5, •. •, F n , for which the first four rows and columns of the struc¬ 
ture matrix are canonical. 

(e) The proof now continues by repeating steps (c) and (d) until the structure 
matrix has the desired form. □ 

Corollary 3.5 (Casimir Functions). In the situation of Theorem 3.4 the functions 
Ci(y), ...,C q (y) satisfy 


{Ci,H} = 0 for all smooth H. (3.16) 

Proof Theorem 3.4 states that VCi ( y) T B(y)VH(y) = 0, when H(y) is one of the 
functions Pj(y),Qj(y) or Cj(y). However, the gradients of these functions form a 
basis of M n . Consequently, \/Ci(y) T B(y) = 0 and (3.16) is satisfied for all differ¬ 
entiable functions H(y). □ 

This property implies that all Casimir functions are first integrals of (2.12) what¬ 
ever H(y) is. Consequently, (2.12) is (close to yf) a differential equation on the 
manifold 

M = {y G U | Ci(y) = Consti , i = l,...,m}. (3.17) 

Corollary 3.6 (Transformation to Canonical Form). Denote the transformation 
of Theorem 3.4 by z = $(y) = (P^y), Qi{y),Ck{y))- With this change of coordi¬ 
nates, the Poisson system y = B(y)VH(y) becomes 

z = B 0 VK(z) with B o=^ J f 0 ), (3.18) 

where K(z ) = H(y). Writing z = (p, g, c ), this system becomes 

P= -K q (p,q,c), q = K p (p,q,c), c = 0 . 

Proof. The transformed differential equation is 

i = d\y)B(y)d\y) T VK(z) with y = i3~ 1 (z), 

□ 


and Theorem 3.4 states that d'(y)B(y)D'(y) T = P 0 - 
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VII.4 Poisson Integrators 

Before discussing geometric numerical integrators, we show that many important 
properties of Hamiltonian systems in canonical form remain valid for systems 

V = B(y)VH(y), (4.1) 

where B(y) represents a Poisson bracket. 


VII.4.1 Poisson Maps and Symplectic Maps 

We have already seen that the Hamiltonian H(y) is a first integral of (4.1). We shall 
show here that the flow of (4.1) satisfies a property closely related to symplecticity. 

Definition 4.1. A transformation ip : U —> M n (where U is an open set in M n ) is 
called a Poisson map with respect to the bracket (2.8), if its Jacobian matrix satisfies 

<p'(y)B(y)ip'(y) T = B(ip(y)). (4.2) 

An equivalent condition is that for all smooth real-valued functions F, G defined 

on ip(U), 

{Foif,Goip}(y) = {F,G}(ip{y)), (4.3) 

as is seen by the chain rule and choosing F, G as the coordinate functions. It is 
clear from this condition that the composition of Poisson maps is again a Poisson 
map. A comparison with (3.12) shows that Poisson maps leave the structure matrix 
invariant. 

For the canonical symplectic structure, where B(y) = J -1 , condition (4.2) is 
equivalent to the symplecticity of the transformation (p(y). This can be seen by tak¬ 
ing the inverse of both sides of (4.2), and by multiplying the resulting equation with 
from the right and with p'(y) T from the left. Also in the situation of a Hamil¬ 
tonian system (2.17) on a symplectic submanifold M, where B(y) is the structure 
matrix of the differential equation in coordinates y as in Theorem 2.8, condition 
(4.2) is equivalent to symplecticity in the sense of preserving the symplectic two- 
form (2.16) on the tangent space, as in (1.16): 

Definition 4.2. A map ^ : M —> M on a symplectic manifold M is called sym¬ 
plectic if for every x G M, 

for all 6 ,&eT x M. (4.4) 

A near-identity map ij; : M —> M is symplectic if and only if the conjugate map 
p in local coordinates x = xiv)* with p{y) given by 'ip(x) = f° r x = x{y)’ 

is a Poisson map for the structure matrix of (2.21), B(y) = (X(y) T JX(y))~ 1 with 
X(y) = x'(y)- This holds because x )£ = X(ip(y))p r (y)rj for x = x{v) an d 
^ = X(y)rj, and because (4.2) is equivalent to ip'(y) T X(p(y)) T JX((p(y))<p f (y) = 
X(y) T JX(y). 
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Theorem 4.3. If B(y) is the structure matrix of a Poisson bracket, then the flow 
ip t (y) of the differential equation (4.1) is a Poisson map. 

Proof, (a) For B(y) = J~ l this is exactly the statement of Theorem VI.2.4 on the 
symplecticity of the flow of Hamiltonian systems. This result can be extended in a 
straightforward way to the matrix Bq of (3.18). 

(b) For the general case consider the change of coordinates 2 = d(y) which 
transforms (4.1) to canonical form (Theorem 3.4), i.e., (y)B(yW (y) T = B o and 
i = BqS7K(z) with K(z ) = H(y) (Corollary 3.6). Denoting the flows of (4.1) and 
i = Bq'VK(z) by p t (y) and xp t (z), respectively, we have f>t{d(y)) = d(p t (y)) 
and by the chain rule fl>' t (d(y))d' (y) = d' (<p t (y)) (p' t (y). Inserting this relation into 
fl>' t (z)Bofl>' t (z) T = Bo, which follows from (a), proves the statement. 

A direct proof, avoiding the use of Theorem 3.4, is indicated in Exercise 6. □ 

From Theorems 2.8 and 4.3 and the remark after Definition 4.2 we note the 
following. 

Corollary 4.4. The flow of a Hamiltonian system (2.17) on a symplectic submani¬ 
fold is symplectic. 

The inverse of Theorem 4.3 is also true. It extends Theorem VI.2.6 from canon¬ 
ically symplectic transformations to Poisson maps. 

Theorem 4.5. Let f(y) and B(y) be continuously differentiable on an open set 
U C and assume that B(y) represents a Poisson bracket (Definition 2.4). 
Then, y = f(y) is locally of the form (4.1), if and only if 

• its flow p t (y) respects the Casimirs of B(y), i.e., Ci(ipt(y)) = Const, and 

• its flow is a Poisson map for ally E U and for all sufficiently small t. 

Proof. The necessity follows from Corollary 3.5 and from Theorem 4.3. For the 
proof of sufficiency we apply the change of coordinates (u, c ) = d(y) of Theo¬ 
rem 3.4, which transforms B(y) into canonical form (3.18). We write the differential 
equation y = f(y) in the new variables as 

u = g(u,c ), c = h(u,c). (4.5) 

Our first assumption expresses the fact that the Casimirs, which are the components 
of c, are first integrals of this system. Consequently, we have h(u,c) = 0. The 
second assumption implies that the flow of (4.5) is a Poisson map for Bq of (3.18). 
Writing down explicitly the blocks of condition (4.2), we see that this is equivalent 
to the symplecticity of the mapping uo 1 —> u(t,uo,co), with Co as a parameter. 
From Theorem VI.2.6 we thus obtain the existence of a function K(u, c) such that 
g(u, c ) = J -1 V u K(u, c ). Notice that for flows depending smoothly on a parameter, 
the Hamiltonian also depends smoothly on it. Consequently, the vector field (4.5) 
is of the form Bf\7K(u, c ). Transforming back to the original variables we obtain 
f(y ) = B(y)VH(y) with H(y) = K(d(y)) (see Corollary 3.6). □ 
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VII.4.2 Poisson Integrators 

The preceding theorem shows that “being a Poisson map and respecting the Casimirs” 
is characteristic for the flow of a Poisson system. This motivates the following defi¬ 
nition. 

Definition 4.6. A numerical method yi = $h(y o) is a Poisson integrator for the 
structure matrix B(y), if the transformation y 0 i—> y\ respects the Casimirs and if it 
is a Poisson map whenever the method is applied to (4.1). 

Observe that for a Poisson integrator one has to specify the class of structure 
matrices B(y). A method will never be a Poisson integrator for all possible B(y). 

Example 4.7. The symplectic Euler method reads 

^n+l = H"~ hu n J t -iV n H v (ti-n+l, U n ), V n -\-\ — V n hu n -\-\V n H u (u n -\-\, u n ) 

for the Lotka-Volterra problem (2.13). It produces an excellent long-time behaviour 
(Fig. 4.1, left picture). We shall show that this is a Poisson integrator for all separable 
Hamiltonians H (it, v) = K(u) + L(v). For this we compute the Jacobian of the map 

(u n 5 Vn) 1 * (lt n _|_i, U n _|_i), 


f 1 hv n H v 0\ /^(rt n _|_i, u n _)_i) \ _ f 1 hu n ^\{H v -\~v n H vv ^'\ 

\hv n {H u +u n +\H uu ) 1/\ ^(it n ,u n ) ) \0 1 hu n -\-\H u J 

(the argument of the partial derivatives of H is (u n +i,v n ) everywhere), and we 
check in a straightforward fashion the validity of (4.2). A different proof, using 
differential forms, is given in Sanz-Serna (1994) for a special choice of H(u,v). 
Similarly, the adjoint of the symplectic Euler method is a Poisson integrator, and 
so is their composition - the Stormer-Verlet scheme. Composition methods based 
on this scheme yield high order Poisson integrators, because the composition of 
Poisson maps is again a Poisson map. 

The implicit midpoint rule, though symplectic when applied to canonical Hamil¬ 
tonian systems, turns out not to be a Poisson map for the structure matrix B(u , v) of 
(2.13). Figure 4.1 (right picture) shows that the numerical solution does not remain 
near a closed curve. 

It is a difficult task to construct Poisson integrators for general Poisson systems; 
cf. the overview by Karasozen (2004). First of all, for non-constant B(y) condi¬ 
tion (4.2) is no longer a quadratic first integral of the problem augmented by its 
variational equation (see Sect. VI.4.1). Secondly, the Casimir functions can be ar¬ 
bitrary and we know that only linear and quadratic first integrals can be conserved 
automatically (Chap. IV). Therefore, Poisson integrators will have to exploit special 
structures of the particular problem. 

Splitting Methods. Consider a (general) Poisson system y = B(y)VH(y) and 
suppose that the Hamiltonian permits a decomposition as H(y) = (y) + ... + 
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Fig. 4.1. Numerical solutions of the Lotka-Volterra equations (2.13) (step size h — 0.25, 
which is very large compared to the period of the solution; 1000 steps; initial values (4, 2) 
and (6, 2) for all methods) 


such that the individual systems y = B(y)\7H^ (y) can be solved ex¬ 
actly. The flow of these subsystems is a Poisson map and automatically respects 
the Casimirs, and so does their composition. McLachlan (1993), Reich (1993), and 
McLachlan & Quispel (2002) present several interesting examples. 

Example 4.8. In the previous example of a Lotka-Volterra equation with separable 
Hamiltonian H(u,v) = K(u) + L(v), the systems with Hamiltonian K(u) and 
L(v ) can be solved explicitly. Since the flow of each of the subsystems is a Poisson 
map, so is their composition. Combining a half-step with L , a full step with K , 
and again a half-step with L, we thus obtain the following Verlet-like second-order 
Poisson integrator: 


u n+ 1/2 

= exp (^v n VL(v n ))u n 


^n+1 

= exp(-hu n+1/2 VK(u n+1/2 ))v n 

(4.6) 

^n+l 

= ex p(^ v n+ iX7L(v n+ i))u n+ i/ 2 . 



In the setting of Hamiltonian systems on a manifold, the splitting approach can 
be formulated in the following way. 

Variational Splitting. Consider a Hamiltonian system (2.17) on a symplectic man¬ 
ifold M, and use a splitting H = of the Hamiltonian in the following 

algorithm: 

1. Let x+ £ M be the solution at time h /2 of the equation for x, 

(Jx - ViJ [1] (x), £) = 0 for all £ e T X M (4.7) 


with initial value x(0) = x n . 
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2. Let x n+1 be the solution at time h of 

(Jx - ViJ [2] (re), 0 = o for all £ e T X M (4.8) 

with initial value x(0) = x+. 

3. Take x n+ i as the solution at time h/ 2 of (4.7) with initial value x(0) = x~ +1 . 

Splitting algorithms for Hamiltonian systems on manifolds have been studied by 
Dullweber, Leimkuhler & McLachlan (1997) and Benettin, Cherubini & Fasso 
(2001) in the context of rigid body dynamics; see Sect. VII.5. Lubich (2004) and 
Faou & Lubich (2004) have studied the above splitting method for applications in 
quantum molecular dynamics; see Sect. VII.6 for an example. 

By Theorem 2.8, the substeps 1.-3. written in coordinates x = x(y) are Poisson 
systems y = B(y)VK^ (y) with K^(y) = H^(x(y))> but the algorithm itself 
is independent of the choice of coordinates. Since the substeps are exact flows of 
Hamiltonian systems on the manifold A4, their composition yields a symplectic 
map. In the coordinates y the substeps are the exact flows of Poisson systems, and 
hence their composition yields a Poisson map. 

Poisson Integrators and Symplectic Integrators. Generally we note the follow¬ 
ing correspondence, which rephrases the remark on symplectic maps and Poisson 
maps after Definition 4.2. It applies in particular to the symplectic integrators for 
constrained mechanics of Sect. VII. 1. 

Lemma 4.9. An integrator x\ = \&h(x o) for a Hamiltonian system (2.17) on a 
manifold M. is symplectic if and only if the integrator written in local coordinates, 
yi = $h(yo) corresponding to a coordinate map x = x(y)> a Poisson integrator 
for the structure matrix B(y) of (2.21). 


VII.4.3 Integrators Based on the Darboux-Lie Theorem 


If we explicitly know a transformation z = d(y) that brings the system y = 
B(y)VH(y) to canonical form (as in Corollary 3.6), we can proceed as fol¬ 
lows: compute z n = d(y n ); apply a symplectic integrator to the transformed sys¬ 
tem i = BqVK(z) (Bo is the matrix (3.18) and K(z) = H(y)) which yields 
z n+ 1 = \&h(z n ) \ compute finally y n + i from z n+ 1 = $(t/ n+ i). This yields a Poisson 
integrator by the following lemma. 

Lemma 4.10. Let z = (it, c) = d(y) be the transformation of Theorem 3.4. Sup¬ 
pose that the integrator <T>h(y ) takes the form 


*h{z) = 


c 


in the new variables z = (it, c). Then, ^h{y) I s a Poisson integrator if and only if 
it i—* ^(it, c) is a symplectic integrator for every c. 
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Proof. The integrator @h(y) is Poisson for the structure matrix B(y) if and only if 
&h(z) is Poisson for the matrix Bq of (3.18); see Exercise 7. By assumption, ^(z) 
preserves the Casimirs of Bq. The identity 


*' h (z)B 0 *' h {z) T 


f AJ~ X A T 0\ 

V o V 


with A = d&l /du proves the statement. □ 

Notice that the transformation $ has to be global in the sense that it has to be 
the same for all integration steps. Otherwise a degradation in performance, similar 
to that of the experiment in Example V.4.3, has to be expected. 

Example 4.11. As a first illustration consider the Lotka-Volterra system (2.13). 
Applying the transformation d{u,v) = (lniz, lnv) = (p, g), this system becomes 
canonically Hamiltonian with 

K(p,q) = -H(u,v) = -H(eP,e<>). 

If we apply the symplectic Euler method to this Hamiltonian system, and if we 
transform back to the original variables, we obtain the method 

U n +1 = u n exp(hv n H v (u n+1 ,v n )), 
v n +i = v n ex.p(-hu n+ iH u (u n+1 ,v n )). 

In contrast to the method of Example 4.7, (4.9) is also a Poisson integrator for (2.13) 
if H(u, v ) is not separable. If we compose a step with step size h /2 of the symplec¬ 
tic Euler method with its adjoint method, then we obtain again, in the case of a 
separable Hamiltonian, the method (4.6). 

Example 4.12 (Ablowitz-Ladik Discrete Nonlinear Schrodinger Equation). 

An interesting space discretization of the nonlinear Schrodinger equation is the 
Ablowitz-Ladik model 

iyk + - 2 Vk + Uk- 1 ) + \yk\ 2 (yk+l +Vk- 1 ) = 0, 

which we consider under periodic boundary conditions yk+N = Vk (Ax = 1/N). 
It is completely integrable (Ablowitz-Ladik 1976) and, as we shall see below, it is a 
Poisson system with noncanonical Poisson bracket. Splitting the variables into real 
and imaginary parts, yk—^k^r iv \ c , we obtain 

Uk = - 2v k + v k -i) - (u 2 k + v k)(vk-m + v k -i) 

v.k = i u k +1 — 2 u k + Uk- 1) + (w| + ffc) ( Uk+1 + Uk- 1) ■ 

With u = (ui ,... ,un), v = (t?i,..., vn) this system can be written as 
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0 -D(u,v) \ (V u H(u,v) 

D(u,v ) 0 J \ V v H(u,v) 

where D = diag(di,..., d/v) is the diagonal matrix with entries 

dk 0, v) = 1 + Ax 2 (u 2 k +vl), 


(4.10) 


and the Hamiltonian is 

N 


N 


H{u, v ) m ' («/«// .j I vivi ,) + 

1 = 1 1=1 

We thus get a Poisson system (the conditions of Lemma 2.3 are directly verified). 
There are many possibilities to transform this system to canonical form. Tang, 
Perez-Garcia & Vazquez (1997) propose the transformation 

1 / Ax 


Pk 


arctan (- 


AxyJ I + Ax 2 v\ + Ax 2 v\ 


' Uk), 


Qk = Vk, 


for which the inverse can be computed in a straightforward way. Here, we suggest 
the transformation 


Pk = u k a(Ax 2 (u 2 k + vl)) 
qk = v k a(Ax 2 (u 2 k +vl)) 


with 


r(x) 


ln(l + x) 


(4.11) 


which treats the variables more symmetrically. Its inverse is 

U k =p k T(Ax 2 (p 2 k +ql)) 


Vk = q k T(Ax 2 {p 2 k + ql)) 


with 




exp x — 1 


Both transformations take the system (4.10) to canonical form. For the transforma¬ 
tion (4.11) the Hamiltonian in the new variables is 

1 N 

H(p, q ) = ^2 ^ 2 T i Ax2 (pf + q?)) T ( Ax 2 (pl-i + qf- 1 )) (pipi-i + mi- 1 ) 

1=1 , N 


- -b £ (p? 




i=i 


Applying standard symplectic schemes to this Hamiltonian yields Poisson integra¬ 
tors for (4.10). 


VII.5 Rigid Body Dynamics and Lie-Poisson Systems 

... these topics, which, after all, have occupied workers in geometric me¬ 
chanics for many years. (R. McLachlan 2003) 

An important Poisson system is given by Euler’s famous equations for the mo¬ 
tion of a rigid body (see left picture of Fig. 5.1), for which we recall the history and 
derivation and present various structure-preserving integrators. Euler’s equations are 
a particular case of Lie-Poisson systems, which result from a reduction process of 
Hamiltonian systems on a Lie group. 
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VII.5.1 History of the Euler Equations 

“Le sujet que je me propose de traiter ici, est de la derniere importance 
dans la Mecanique; & j’ai deja fait plusieurs efforts pour le mettre dans 
tout son jour. Mais, quoique le calcul ait asses bien reussi, & que j’ai 
decouvert des formules analytiques ..., leur application etoit pourtant as- 
sujettie a des difficultes qui m’ont paru presque tout a fait insurmontables. 
Or, depuis que j’ai develope les principes de la conoissance mecanique 
des corps, la belle propriete des trois axes principaux dont chaque corps 
est doue, m’a enfin mis en etat de vaincre toutes ces difficultes,...” 

(Euler 1758b, p. 154) 


A great challenge for Euler were his efforts to establish a mathematical analysis for 
the motion of a rigid body. Due to the fact that such a body can have an arbitrary 
shape and mass distribution (see left picture of Fig. 5.2), and that the rotation axis 
can arbitrarily move with time, the problem is difficult and Euler struggled for many 
years (all these articles are collected in Opera Omnia , Ser. II, Vols. 7 and 8 ). The 
breakthrough was enabled by the discovery that any body, as complicated as may be 
its configuration, reduces to an inertia ellipsoid with three principal axes and three 
numbers, the principal moments of inertia (Euler 1758a; see the middle picture of 
Fig. 5.2 and the citation). 

ccp r — A p' — Rq' — 0 r'; 


dx 

dy 

dz 



yz dt 


2gV dt 
M aa 



xzdt 


zpQdt 

"MJF 



xydt — 


2 g Rdt 
M cc 


5 , la qualrieme et la si 
on aura pareillement 

*q'=Tlq'— Fa'— H/>'; 

me, la cinquieme et la 
(f , on aura 

cu J = C r f — Gp r — Fq r ; 


Fig. 5.1. Left picture: first publication of the Euler equations in Euler (1758b). Right picture: 
principal axes as eigenvectors in Lagrange (1788) 


The Inertia Ellipsoid. We choose a moving coordinate system connected to the 
body B and we consider motions of the body where the origin is fixed. By another 
of Euler’s famous theorems, any such motion is infinitesimally a rotation around an 
axis. We represent the rotation axis of the body by the direction of a vector u and 
the speed of rotation by the length of u. Then the velocity of a mass point x of B is 
given by the exterior product 


V = CJ X X = 


^2^3 

- U 3 X 2 \ 

( 0 

—CJ3 

^2 \ 

/ X! 

CJ3X1 

-U1X3 1 = 1 

CJ3 

0 

-CJi 


001X2 

- U2X1 1 

\-w 2 

cdi 

0 / 

\x 3 


(5.1) 


(orthogonal to cj, orthogonal to x, and of length ||o;|| • ||x|| • sin 7 ; see the left picture 
of Fig. 5.2). The kinetic energy is obtained by integrating the energy of the mass 
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Fig. 5.2. A rigid body rotating around a variable axis (left); the corresponding inertia ellipsoid 
(middle); the corresponding angular momentum (right) 


points dm over the body 


T = 7 ) [ \\u x x\\ 2 dm (5.2) 

2 JB 

= ((^ 2^3 - ^ 3 ^ 2) 2 + (^ 3^1 - ^ 1 ^ 3) 2 + (^ 1^2 - ^ 2^1 ) 2 ^j dm . 

If this is multiplied out, one obtains 

T=\uo t Gu, where Gu = [(x 2 k +xj)dm, 0 ik m -[ Xi x k dm, + 

2 JB JB 

(5.3) 

Euler (1758a) showed, by endless trigonometric transformations, that there exist 
principal axes of the body in which this expression takes the form 


T =2 




(5.4) 


This was historically the first transformation of such a 3x3 quadratic form to diago¬ 
nal form. Later, Lagrange (1788) discovered that these axes were the eigenvectors of 
the matrix 0 and the moments of inertia I k the corresponding eigenvalues (without 
calling them so, see the right picture of Lig. 5.1). 

The Angular Momentum. The first law of Newton’s Principia states that the mo¬ 
mentum v ■ m of a mass point remains constant in the absence of exterior forces. 
The corresponding quantity for rotational motion is the angular momentum , i.e., 
the exterior product xxv times the mass. Integrating over the body we obtain, with 

(5.1), 



If this is multiplied out, the matrix 0 appears again and one obtains the surprising 
result (due to Poinsot 1834) 
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y = (9 cj, or, in the principal axes coordinates, yk = Ik • (5.6) 

Such a relation is familiar from the theory of conjugate diameters (Apollonius, Book 
II, Prop. VI): the angular momentum is a vector orthogonal to the plane of vectors 
conjugate to u (see the right picture of Fig. 5.2). 

The Euler Equations. Euler’s paper (1758a), on his discovery of the principal axes, 
is immediately followed by Euler (1758b), where he derives his equations for the 
motion of a rigid body by long, doubtful and often criticized calculations, repeated 
in a little less doubtful manner in Euler’s monumental treatise (1765). Beauty and 
elegance, not only of the result, but also of the proof, is due to Poinsot (1834) and 
Hayward (1856). It is masterly described by Klein & Sommerfeld (1897), and in 
Chapter 6 of Arnold (1989). 

From now on we choose the coordinate system, moving with the body, such that 
the inertia tensor remains diagonal. We also watch the motion of the body from a 
coordinate system stationary in the space. The transformation of a vector x G M 3 in 
the body frame 4 , to the corresponding x G M 3 in the stationary frame, is denoted 
by 

x = Q(t)x . (5.7) 

The matrix Q(t) is orthogonal and describes the motion of the body: for x = e* we 
see that the columns of Q(t) are the coordinates of the body’s principal axes in the 
stationary frame. 

The analogous statement to Newton’s first law for rotational motion is: in the 
absence of exterior angular forces, the angular momentum y, seen from the fixed 
coordinate system, is a constant vector 5 . This same vector y , seen from the moving 
frame, which at any instance rotates with the body around the vector u, rotates in 
the opposite direction. Therefore we have from (5.1), by changing the signs of u, 
the derivatives 

(Vi\ / 0 cj 3 -cj 2 \ / 2/i \ 

M/2 = -^3 0 CJi I 7/2 I • (5.8) 

\ 2/3 / V ^2 —^i 0 J \ ys J 

If we insert = yk/Ik from (5.6), we obtain 

tili\ ( 0 y?,/I?, -y 2 /h\tyi\ 

2/2 = o yi/h M22 

\ 2 / 3 / \ 1J2/I2 -yi/h 0 / \2/3/ 

or, by rearranging the products the other way round, 

fi/i\ ( 0 - 2/3 2 / 2 \ (yi/h\ 

I 2/2 J = I 2/3 0 — 2/1 ] I y 2 /h I , (5.10) 

\mj V 2/2 2/1 0 / V 2 / 3 // 3 / 

4 Long-standing tradition, from Klein to Arnold, uses capitals for denoting the coordinates 
in this moving frame; but this would lead to confusion with our subsequent matrix notation 

5 For a proof of this statement by d’Alembert’s Principle, see Sommerfeld (1942), §11.13. 


((is 1 ^i2 1 )ysy2\ 
Ur 1 -4 -1 ) 2 / 12/3 

\(Ip-Ip)y 2yi J 

(5.9) 
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written in two different ways as a Poisson system, whose right hand vectors are 
the gradients of C(y) = | ELi Vk and H (v ) = \ ELi K 1 vh respectively. 
These are the two quadratic invariants of Chap. IV. The first represents the length 
of the constant angular momentum y in the orthogonal body frame, and the second 
represents the energy (5.4). 

Computation of the Position Matrix Q(t). Once we have solved the Euler equa¬ 
tions for y(t ), we obtain the rotation vector u(t) by (5.6). It remains to find the ma¬ 
trix Q(t) which gives the position of our rotating body. We know that the columns 
of the matrix Q , seen in the stationary frame, correspond to the unit vectors in the 
body frame. These rotate, by (5.1), with the velocity 

/ 0 -CJ3 CJ 2 \ 

(lj x x e 2 ,c<; x e 3 j = I cj 3 0 — I =: W . (5.11) 

\—Cd2 Wi 0 j 

We thus obtain Q , the rotational velocity expressed in the stationary frame, by the 
back transformation (5.7): 


Q = QW or Q t Q = W. (5.12) 

This is a differential system for Q which, because W is skew-symmetric, preserves 
the orthogonality of Q. The problem is solved - in theory. 


VII.5.2 Hamiltonian Formulation of Rigid Body Motion 

In order to open the door for efficient numerical algorithms, we treat the rigid body 
as a constrained Hamiltonian system. 

Position Variables. The position of the rigid body at time t is determined, in view 
of (5.7), by a three-dimensional orthogonal matrix Q(t). The constraints to be re¬ 
spected are thus Q T Q — I = 0. 

Kinetic Energy. As in (5.12), we associate with Q and Q the skew-symmetric 
matrix W = Q T Q whose entries c<;&, arranged as in (5.11), determine the kinetic 
energy by (5.4): 

T = - . 

For any diagonal matrix D = diag(di, d 3 ) we observe 

trace ( WDW T ) = (^2 + d 3 )cj^ + (d 3 + di)cj| + (di + d 2 )cj 3 . 
Therefore, with 


h — ^2 + d 3 , I 2 — ds + di, Is — d\ + d2 


or 


dk 



x\ dm 


(5.13) 
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(note that dk > 0 for all bodies that have interior points), we obtain the kinetic 
energy as 

T = 1 trace (WDW T ). (5.14) 

Inserting W = Q T Q , we have 

T = i trace ( Q T QDQ T Q ) = i trace ( QDQ T ), (5.15) 

since Q is an orthogonal matrix. 

Conjugate Variables. We now have an expression for the kinetic energy in terms of 
derivatives of position coordinates and are able to introduce the conjugate momenta 

P = dT/dQ = QD . (5.16) 

If we suppose to have, in addition to T, a potential U ( Q ), we get the Hamiltonian 

P(P, Q) = 1 trace (PPr’P 7 ’) + C7(Q). (5.17) 

Lagrange Multipliers. The constraints are given by the orthogonality of Q , i.e., the 
equation g(Q) = Q T Q — I = 0. Since this matrix is always symmetric, this consists 
of|n(n + l) = 6 independent algebraic conditions, calling for six Lagrange multi¬ 
pliers. If the expression G(Q) T X in (1.9) is actually computed, it turns out that this 
term becomes the product QA , where the six Lagrange multipliers are arranged in a 
symmetric matrix A; see also formula (IV.9. 6 ). Thus, the constrained Hamiltonian 
system (1.9) reads in our case, with VU = ( dU/dQij ), 

Q = PD - 1 

P = —VU(Q) - QA (A symmetric) (5.18) 

0 = Q r Q — I. 

Reduction to the Euler Equations. The key idea is to introduce the matrix 

( 0 —d2id3 d3(J2 \ 

d ^3 o -d 3 uj 1 , (5.19) 

— diLU2 0 / 

where the can be further expressed in terms of the angular momenta yk = Ik ^k- 
Using the notation skew (A) = ^(A — A T ), we see, with (5.13), that 

( 0 -2/3 2/2 \ 

r-r T = 2skew(F) = y 3 0 - Vl 1 (5.20) 

\ 1/2 !Jl 0 / 

contains just the angular momenta. Moreover, DY is skew-symmetric. By (5.18) 
the derivative of Y is seen to be 
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Y = Q t P + Q t P = D~ 1 P t P-Q t X7U(Q)-A = D~ 1 Y T Y-Q T VU(Q)-A. 

Taking the skew-symmetric part of this equation, the symmetric matrix A drops out 
and we obtain 

skew (Y) = skew (D“ 1 r r F) - skew ( Q T VU(Q )). (5.21) 

These are, for V = 0, precisely the above Euler equations, obtained a second time. 


VII.5.3 Rigid Body Integrators 

For a numerical simulation of rigid body motions, one can either solve the con¬ 
strained Hamiltonian system (5.18), or one can solve the differential equation (5.21) 
for the angular momentum Y(t) in tandem with the equation (5.12) for Q(t). We 
consider the following approaches: (I) an efficient application of the RATTLE algo¬ 
rithm (1.26), and (II) various splitting methods. 

(I) RATTLE 

We apply the symplectic RATTLE algorithm (1.26) to the system (5.18), and rewrite 
the formulas in terms of the variables Y and Q. This approach has been proposed 
and developed independently by McLachlan & Scovel (1995) and Reich (1994). 

An application of the RATTLE algorithm (1.26) to the system (5.18) yields 

P 1/2 = Po-^U(Qo)-^Q 0 A 0 

Qi = Qo + hPi/ 2 D 1 , QiQi=I (5.22) 

Pi = Pi/2 - - \QxA x , Q\P x D~ x + D~ x PyQx = 0, 

where both Aq and A\ are symmetric matrices. We let To = QqPq, Y\ = Qf Pi, 
and Z = Qq P 1 / 2 D~ 1 . We multiply the first relation of (5.22) by Qo > the last 
relation by Qf, and we eliminate the symmetric matrices A 0 and A\ by taking the 
skew-symmetric parts of the resulting equations. The orthogonality of Qo Qi = 
I + hZ implies hZ T Z = — (Z + Z T \ which can then be used to simplify the last 
relation. Altogether this results in the following algorithm. 

Algorithm 5.1. Let Qo be orthogonal and DY$ be skew-symmetric. One step 
(Qo, To) ► (Qi, Yi) of the method then reads as follows: 

- find Z such that I + hZ is orthogonal and 

skew (ZD) = skew (kb) - | skew (Q^V(7(Qo)), (5.23) 

- compute Qi = Qo(^ + hZ), 

- compute Y\ such that DY\ is skew-symmetric and 

skew (Yi) = skew (ZD) — skew ((Z + Z t )D) — ^ skew (QfVC/(Qi)). 
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The second step is explicit, and the third step represents a linear equation for the 
elements of Y \. 

Computation of the First Step. We write for the known part of equation (5.23) 

/ 0 -a 3 a 2 \ 

skew (y 0 )-£ skew (Q%VU(Q 0 )) = 0 -a.\ I = A (5.24) 

\ -OL2 Oil 0 / 


and have to solve 

^(ZD - DZ T ) = A, (I + hZ T )(I + hZ) = I, I (ZD + DZ T ) = S 

(the trick was to add the last equation with S an unknown symmetric matrix). Elim¬ 
ination gives Z = (A + S)D~ 1 and Z T = D~ 1 (S — A). Both inserted into the 
second equation lead to a Riccati equation for S. There exist efficient algorithms 
for such problems; see the reference in Sect. IV.5.3 and a detailed explanation in 
McLachlan & Zanna (2005). 

Remark 5.2 (Moser-Veselov Algorithm). An independent access to the above 
formulas is given in a remarkable paper by Moser & Veselov (1991), by treating 
the rigid body through a discretized variational principle, similar to the ideas of 
Sect. VI.6.2. The equivalence is explained by McLachlan & Zanna (2005), follow¬ 
ing a suggestion of B. Leimkuhler and S. Reich. 


Quaternions (Euler Parameters). An efficient implementation of the above algo¬ 
rithm requires suitable representations of orthogonal matrices, and the use of quater¬ 
nions is a standard approach. 

After having revolutionized Lagrangian mechanics (see Chapt. VI), Hamilton 
struggled for years to generalize complex analysis to three dimensions. He finally 
achieved his dream, however not in three dimensions, but in four , and founded in 
1843 the theory of quaternions. 

For an introduction to quaternions (whose coefficients are sometimes called 
Euler parameters) we refer to Sects. IV.2 and IV.3 of Klein (1908), and for their 
use in numerical simulations to Sects. 9.3 and 11.3 of Haug (1989). Quaternions 
can be written as e = eo + ie i + je 2 + ke 3 , where multiplication is defined via the 
relations i 2 = j 2 = k 2 = — 1 , ij = k, jk = i, ki = j , and ji = —fc, kj = — i, 
ik = —j. The product of two quaternions e • / (written in matrix notation) is 


(e 0 -fie 1 + je 2 + fee 3 ) 

*(/o + i/i + jf 2 + kh) 



(5.25) 


We see (in grey) that in dimensions 1,2,3 appears a skew-symmetric matrix E 
whose structure is familiar to us. This part of the matrix changes sign if the two 
factors are permuted. 
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An important discovery, for three dimensional applications of the quaternions, 
is the following: if a quaternion p is a 3-vector (i.e., has po = 0), then p' = e-p-e is 
a 3-vector, too, and the map p ^ p' is described by the matrix 


Q(e) = ||e|| 2 / + 2e$E + 2E 2 , 


where e = eo — ie\ — je 2 — ke 3 and ||e || 2 


/ 0 -e 3 e 2 \ 

E = \ e 3 0 -ei (5.26) 

\ -e 2 ei 0 / 

= e • e == + e 2 + + e 2 . 


Lemma 5.3. If ||e|| = 1, f/ie matrix Q(e) is orthogonal. Every orthogonal 
matrix with det Q = 1 ran written in this form. We have Q(e)Q(f) = Q(ef), so 
that the multiplication of orthogonal matrices corresponds to the multiplication of 
quaternions. 

Geometrically, the matrix Q effects a rotation around the axis e = (ei, e 2 , ef) T 
with rotation angle p which satisfies tan(y?/2) = ||er||/eo- 

Proof. The condition Q T Q = I can be verified directly using E T = —E and 
E 3 = — (e 2 + + el)E. The reciprocal statement is a famous theorem of Euler; it 

is based on the fact that 6 is an eigenvector of Q, which in dimension 3x3 always 
exists. The formula for Q(e)Q(f) follows from e-f-p-f-e = (e-/)-p-(e-/). 

The geometric property follows from the 
virtues of the exterior product, because by 
(5.1) the matrix Q maps a vector x to 

x + 2 eo exx J r 2ex{exx). 

This consists in a rectangular mouvement in 
a plane orthogonal to 5 ; first vertical to x by 
an amount 2eo ||e|| (times the distance of x), 
then parallel to x by an amount 2||£|| 2 . 

Applying Pythagoras’ Theorem as (2eo ||^||) 2 + (2||^|| 2 — l) 2 = 1, it turns out 
that the map is norm preserving if e§ +1|^|| 2 = 1. The angle p/2, whose tangens can 
be seen to be ||er||/eo, is an angle at the circumference of the circle for the rotation 
angle p at the center. □ 



2 e 0 ||e|| 


For an efficient implementation of Algorithm 5.1 we represent the orthogonal 
matrices Q 0 , Q 1 , and I + hZ by quaternions. This reduces the dimension of the 
systems, and step 2 becomes a simple multiplication of quaternions. For solving 
the nonlinear system of step 1 , we let / + hZ = Q(e). With the values of oli 
from (5.24) and with skew ( hZD ) = 2eo skew (ED) + 2 skew ( E 2 D ), the equation 
(5.23) becomes 

/ hot \ \ / I\e\ \ / (Is — 12)6263 \ 

\ha 2 = 2e 0 \ I 2 e 2 + 2 (h - / 3 )e 3 ei , (5.27) 

\has) \^ 3 e 3 J \ (I 2 — Ii)eie 2 ) 

which, together with 6q + e 2 + e 2 + e 2 = 1, represent four quadratic equations for 
four unknowns. We solve them very quickly by a few fixed-point iterations: update 
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Fig. 5.3. Numerical solutions of the rigid body equations; without/with gravitation, 
with/without symmetry. Initial values y\g — 0.2,^20 = 1.0,?/3o = 0.4; initial position 
of Qo determined by the quatenion eg = 0.4,ei = 0.2,e2 = 0.4,e3 = 0.8; moments 
of inertia I\ — 0.5, D = 0.85 (0.5 in the symmetric case), I 3 = 1; step size h — 0.1, 
integration interval 0 < t < 30 


successively from the ith equation of (5.27) and then eo from the normaliza¬ 
tion condition. A Fortran subroutine RATORI for this algorithm is available on the 
homepage <http://www.unige.ch/~hairer>. 

Conservation of Casimir and Hamiltonian. It is interesing to note that, in the ab¬ 
sence of a potential, the Algorithm 5.1 preserves exactly the Casimir y\ + y\ + y\ 
and, more surprisingly, also the Hamiltonian \(y\/I\ + y^/h + yi/h)- This can 
be seen as follows: without any potential we have skew (Yq) = skew (ZD) and 
skew (Yi) = - skew ( Z T D ), so that the vectors (y w , y 2 o, y 3 o) T and (y n , y 2 \, 2/31 ) T 
are equal to u + v and u — v, respectively, where u and v are the vectors of the right- 
hand side of (5.27). Since u and v are orthogonal, we have ||ia + v|| = \]u — v\\, 
which proves the conservation of the Casimir. 

To prove the conservation of the Hamiltonian, we first multiply the relation 
(5.27) with G = diag(l/ y/T {, 1 / y/Lz, 1 / Vh), and then apply the same arguments. 
The vectors Gu and Gv are still orthogonal. 

Example 5.4 (Force-Free and Heavy Top). We present in Fig. 5.3 the numerical 
solutions yi obtained by the above algorithm. In the case of the heavy top, we assume 
the centre of gravity to be (0, 0,1) in the body frame, and assume that the third 
coordinate of the stationary frame is vertical. The potential energy due to gravity is 
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then given by U(Q) = #33 and, expressed by quaternions (5.26), it is U = — 

e i -4 + e i- 


(II) Splitting Methods 

As before we consider the differential equation (5.21) for the angular momenta in 
the body 2 / 1 , 2 / 2 , 2/3 together with the differential equation (5.12) for the rotation 
matrix Q. An obvious splitting in the presence of a potential is 

'Ph/2°®h ol Ph/2> (5-28) 

where (pf represents the exact flow of 

Q = 0 , skew (Y) = — skew (Q T Vf/(Q)), 


and is a suitable numerical approximation of the system corresponding to ki¬ 
netic energy only, i.e., without any potential U(Q). The use of splitting techniques 
for rigid body dynamics was proposed by Touma & Wisdom (1994), McLach- 
lan (1993), Reich (1994), and Dullweber, Leimkuhler & McLachlan (1997). Fasso 
(2003) presents a careful study and comparison of various ways of splitting the ki¬ 
netic energy. 

Computation of We do this by splitting once again, by letting several moments 
of inertia tending to infinity (and the corresponding uji tending to zero). In order 
to avoid formal difficulties with infinite denominators, we write the system (5.10) 
together with (5.12) in the form 


/ m 

V2 

\m 


Q 


( 0 -y 3 

2/3 0 

\ - 2/2 2/1 


l ° 

Q I t V3 (y) 
\-TyM 




Ty 3 (y) Ty 2 (y) \ 

0 -Tyiiv) , 

TyM 0 ) 


(5.29) 

(5.30) 


where T(y) = \{yi/h + y^/h + vl/h) is the kinetic energy, and T y . (y) denote 
partial derivatives. 

Three Rotations Splitting. An obvious splitting of the kinetic energy is 

T{y) = R^y) + R 2 (y) + R 3 (y), R^y) = y 2 J{2I i ), (5.31) 

which results in the numerical method 

= <Phj2 ° Vh/2 ° Vh 1 ° Vh/2 ° 

where pf 1 is the exact flow of (5.29)-(5.30) with T(y) replaced by Ri(y). The flow 
pf 1 is easily obtained: yi remains constant and the second and third equation in 
(5.29) boil down to the harmonic oscillator. We obtain 
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y(t) = S(at)y( 0 ), Q(t) = Q{0)S{at) T (5.32) 

with a = yi(0)/Ii and the rotation matrix 

(10 0 \ 

S(6) = I 0 cos 0 sin# I . 
y 0 — sin 6 cos 6 ) 


Similar simple formulas are obtained for the exact flows corresponding to R 2 
and R 3 . 


Symmetric + Rotation Splitting. It is often advantageous, in particular for a nearly 
symmetric body (ii « J 2 ), to consider the splitting 

nv) = R(y) + S(y), R(„) = (±-±)f, SM = i 

and the corresponding numerical integrator 

sfrT _ R S R 

~ Vh/ 2 ° Ph 0 Vh/ 2 * 


The exact flow is the same as (5.32) with 1-f 1 replaced by /f 1 — J^ 1 . The flow 
of the symmetric force-free top possesses simple analytic formulas, too (see the 
first picture of Fig. 5.3): we observe a precession of y with constant speed around a 
cone and a rotation of the body around uo with constant speed. Therefore 


y(t) = B(f3t)y( 0), Q(t) = Q(0)A(t)B(/3t) T , (5.33) 

where (3 = ( 1 — T7 1 )?/:>, (0), and 

/, / 0 -y 3 ( 0 ) y 2 ( 0 ) \\ / cos 6 * sin 6 » 0 \ 

A(t) = exp — 2/3 (0) 0 -yi(0)|), B( 6 ) = -sin 6 > cos# 0 . 

V 2 \-y2( 0) yi (0) 0 )J V 0 0 1/ 


This can also be checked directly by differentiation. 

Similar to the previous algorithm it is advantageous to use a representation of 
the appearing orthogonal matrices by quaternions. The correspondence between the 
orthogonal rotation matrices appearing in (5.32) and (5.33) and their quaternions is, 
in accordance with Lemma 5.3, the following: 


S(0) T cos(#/ 2 ) + i sin(#/ 2 ) 

B( 6 ) T cos(#/2) + fcsin(0/2) 

A(t) <-»• cos(i?/ 2 ) + a -1 sin($/ 2 ) (iy i(0) + , 72 / 2 ( 0 ) + fcy 3 ( 0 )), 


where a = vV(0 )' 2 + ;// 2 (O ) 2 + 2 / 3 (0 ) 2 and 1 ? = at/I 2 . The matrix multiplica- 
tions in the algorithm can therefore be done very efficiently. A Fortran subroutine 
QUATER for the “symmetric + rotation splitting” algorithm is available on the 
homepage <http://www.unige.ch/~hairer>. 
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VII.5.4 Lie-Poisson Systems 

In Sect. VII.5.1 we have seen that the reduction of the equations of motion of the 
rigid body leads to the Poisson system (5.10) with a structure matrix whose entries 
are linear functions. Here we consider more general Poisson systems 

y = B(y)XH(y), (5.34) 

where the structure matrix B(y) depends linearly on y , i.e., 

n 

hj{y) = Y C ji V k for = !’ • • • ’ n - (5-35) 

k= 1 

Such systems, called Lie-Poisson systems, are closely related to differential equa¬ 
tions on the dual of Lie algebras; see Marsden & Ratiu (1999), Chaps. 13 and 14, 
for an in-depth discussion of this theory. 

Recall that a Lie algebra is a vector space with a bracket which is anti-symmetric 
and satisfies the Jacobi identity (Sect. IV.6). Let Ei, ,.., E n be a basis of a vector 
space, and define a bracket by 


[E i ,E j ] = '52c? j E k (5.36) 

k=l 

with from (5.35). If the structure matrix B(y ) of (5.35) is skew-symmetric and 
satisfies (2.10), then this bracket makes the vector space a Lie algebra (the verifica¬ 
tion is left as an exercise). The coefficients Cfj are called structure constants of the 
Lie algebra. Conversely, if we start from a Lie algebra with bracket given by (5.36), 
then the matrix B(y) defined by (5.35) is the structure matrix of a Poisson bracket. 

Let 0 be a Lie algebra with a basis E \,..., E n , and let 0* be the dual of the Lie 
algebra, i.e., the vector space of all linear forms Y : 0 —► M. The duality is written 
as (y, X) for Y e 0* and X e Q. We denote by F ly ..., F n the dual basis defined 
by (Fi, Ej) = Sij, the Kronecker 5. 

Theorem 5.5. Let 0 be a Lie algebra with basis E\ y ... ,E n satisfying (5.36). To 
y = (t/i, ..., y n ) T G M n we associate Y = Y^j=i VjFj £ 0*, and we consider a 
Hamiltonian 6 H(y) = H(Y). 

Then, the Poisson system y = B(y)XH(y) with B(y) given by (5.35) is equiv¬ 
alent to the following differential equation on the dual 0*: 

(Y,X) = (Y, [H'(Y),X]) for all IG0, (5.37) 

where H'(Y) = 

6 We use the same symbol H for the functions H : R n —► R and H : 0* — > R. 
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Proof. Differentiating H(y) = H(Y) with respect to yi gives 

= and H'(Y) = ± 9 JM Ei . 

Here we have used the identification (0*)* = 0, because H'(Y ) is actually an 
element of (0*)*. With this formula for H'(Y) we are able to compute 


(Y,[H'(Y),Ei]) = (VE 
,? = 1 


dH(y) 

dyj 


[Ej,Ei 


= EE 

j =1 k=1 


dHjy) 

9yj 


Ch (y, Eh), 


where we have used (5.36). Since (Y, Ef) — yi and (Y, Ef) = y^, this shows that 
the differential equation (5.37) is equivalent to 


Vi = 


n n 

= E(E C A») 


i=i fc=i 


d v : , 


which is nothing more than y = B(y)\7H(y) with £>(?/) from (5.35). □ 

We remark that (5.37) can be reformulated as 

Ymad* H , (Y) Y, 

where ad ^ is the adjoint of the operator ad a(X) = [A, X]. 

Equation (5.37) is similar in appearance to the Lie bracket equation L = 
[A(L),L\ = ad a(l)L of Sect. IV.3.2. When 0 is the Lie algebra of a matrix Lie 
group G, then solutions of that equation are of the form L(t) = Ad u(t)^o where 

Ad uX = UXU- 1 ] (5.38) 


see the proof of Lemma IV.3.4. Similarly, for the solution of (5.37) we have the 
following. 

Theorem 5.6. Consider a matrix Lie group G with Lie algebra 0. Then, the solution 
Y(t) G 0* of (5.37) with initial value Y$ G 0* is given by 

(Y(t),X) = (Y 0 ,U{t)~ 1 XU{t)) for all leg, (5.39) 

where U{t) £ G satisfies 

U = —H'(Y(t))U, (7(0) = I. (5.40) 

Equation (5.39) can be written as 

where Ad is the adjoint of Ad jj-i . The solution Y(t) of (5.37) thus lies on the 
coadjoint orbit 

Y(t) elAd^Eo; UeG}. 

In coordinates Y = J2j =1 VjFj* we note Vj ~ U ( t)~ 1 EjU(t )). 


(5.41) 
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Proof. Differentiating the ansatz (5.39) for the solution we obtain 

(Y,X) = (Yq^U-'UU-'XU + U~ 1 XU) 

= (Yo^lX^U-'p) = (Y,[X,lJU- 1 }), 

where we have used (5.39) in the first and the last equation. This shows that (5.37) 
is satisfied with the choice t]U~ x = —H'(Y). □ 

Example 5.7 (Rigid Body). The Lie group corresponding to the rigid body is 
SO(3) with the Lie algebra 50(3) of skew-symmetric 3x3 matrices, with the basis 

/0 0 0 \ / 0 0 1\ /() m 0\ 

E x = 0 0 -1 , E 2 = 0 0 0, E 3 ~ 1 0 0 . 

\0 1 0 ) \-l 0 0/ \0 0 0/ 

If we let x = (x%, x 2 , x 3 ) T be the coordinates of X G 50 (3), then we have Xv = 
x x v for all v G M 3 . Since for U G SO(3), 

U~ l XUv = U^(x x Uv) = U~ x x x v, 

the vector U~ 1 x consists of the coordinates of U^XU G 50(3). 

Let y = (yi,y 2 ,y 3 ) T be the coordinates of Y G 50(3)* with respect to the dual 
basis of Ei, E 2 ,E 3 . Since 

3 3 

(V U~ 1 XU) = - y T U~ x x = (Uyfx, 

j =1 i=l 

the coordinates of Ad u-iY are given by the vector Uy. Therefore, the coordinates 
of the coadjoint orbit of Y lie on a sphere of radius \\y\\. The conservation of the 
coadjoint orbit thus reduces here to the preservation of the Casimir C(y) = y\ + 

y2 + vi 

Lie-Poisson integrators seem to be have first been considered by Ge & Marsden 
(1988), who extend the construction of symplectic methods by generating functions 
to Lie-Poisson systems. Channel & Scovel (1991) propose an implementation of 
these methods based on a coordinatization of the group by the exponential map. 

McLachlan (1993) proposes integrators based on splitting the Hamiltonian and 
illustrates this approach for various examples of Lie-Poisson systems. When ap¬ 
plicable, such splitting integrators yield Poisson integrators that preserve the coad¬ 
joint orbits, since they are composed of exact flows of Lie-Poisson systems. 

Eng0 & Faltinsen (2001) propose to solve numerically the Lie-Poisson system 
(5.34) by applying Lie group integrators such as those of Sect. IV.8 to the differential 
equation (5.40) with (5.39). This approach keeps the solution on the coadjoint orbit 
by construction, but it does not, in general, give a Poisson integrator. 
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VII.5.5 Lie-Poisson Reduction 

The reduction of the Hamiltonian equations of motion of the free rigid body to 
the Euler equations is an instance of a general reduction process from Hamiltonian 
systems with symmetry on a Lie group to Lie-Poisson systems, which we now 
describe; cf. Marsden & Ratiu (1999), Chap. 13, for a presentation in a more abstract 
framework and for an historical account. 

Let us assume that the Lie group G is a subgroup of GL(n) given by 

G = {Q-, 9i (Q)=0,i=l,...,m}, (5.42) 

and consider a Hamiltonian system on G, 

rri 

P = -W q H{P,Q) -TAiVqMQ), Q = V p H(P,Q ) 

(5.43) 

0 = 9i(Q), i = 

where P,Q are square matrices, and VqH = (OH /0Q t l ) . This is of the form 
discussed in Sect. VII. 1.2. In regions where the matrix 

(v og< (Q),V qgj (Q)))^ ^ is invertible, (5.44) 

the Lagrange parameters A; can be expressed in terms of P and Q (cf. formula 
(1.13)). Hence, a unique solution exists locally provided the initial values (Pq, Qo) 
are consistent, i.e., gi(Qo) = 0 and 

^( ( 3o)(v p P(P 0 ,Qo)) = trace (^Vg^(Qo) T VpP(Po, Qo)) = 0? 

or equivalently, Q 0 G G and VpP(Po, Qo) £ Tq 0 G. 

We now assume that the Hamiltonian H is quadratic in P. As we have seen in 
Sect. VII. 1.2, the equations (5.43) can be viewed as a differential equation on the 
cotangent bundle T*G = {(P,Q); Q e G, P £TqG}, where the cotangent space 
TqG is identified with a subspace of matrices such that 

P G TqG if and only if V F P(P, Q) G T Q G. (5.45) 

With this identification, the duality beween TqG and TqG is given by the matrix 
inner product 


(P, V) = trace ( P T V ) for P e TqG, V G TqG. 
We call the Hamiltonian left-invariant , if 

H{U t P, U-'Q) = H{P, Q) for all U G G. 


(5.46) 
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In this case we have H(P,Q ) = H(Q T P,I ) and by differentiating we obtain 
V pH(P, Q) = QV pH(Q T P, I). By (5.45) and since TqG = {QX ; X £ 0} 
with the Lie algebra 0 = TjG (cf. Sect. IV.6), this relation implies 

P £ TqG if and only if Q t P£T/G = 0*. (5.47) 

Now 77(P, Q) depends, for (P, Q) £ T*G, only on the product Y = Q T P £ 0*, 
and we write 7 77(P, Q) = H(Y ) with a function 77 : 0* —> M. 

Left-invariant Hamiltonian systems can be reduced to a Lie-Poisson system of 
half the dimension by a process that generalizes the derivation of the Euler equations 
for the rigid body. 

Theorem 5.8. Consider a Hamiltonian system (5.43) on a matrix Lie group G 
with a left-invariant quadratic Hamiltonian H(P , Q ) = H(Y ) for Y = Q T P. If 
(P(7),Q(7)) £ T*G is a solution of the system (5.43), thenY(t) = Q(t) T P(t) £ 0* 
solves the differential equation (5.37). 

Proof. It is convenient for the proof (though not necessary, see the lines follow¬ 
ing (2.17)) to extend the Hamiltonian H : 0* —► M to a function of arbi¬ 
trary matrices Y by setting 77(E) = H(nY ), where 77 is the projection onto 
0* given by (177, X) = (V, X) for all X £ 0, with the matrix inner product 
(Y,X) = trace (Y T X). 

We first compute the derivatives of H(P,Q) = H(Y) = H(nY) = H(y ) 
where Q T P = Y and, using the notation of Theorem 5.5, 177 = VjFj- 

Since yj = (PQ T P, Ej) = ( Q T P\ Ej ), it follows from trace (A T B) = B that 

v P ff(p, q ) = y ^ = E trace ( pT ^i) = wn 

i = l Vj 3 = 1 Vj 

(5.48) 

where H'(Y) = Ej £ 0 as in Theorem 5.5. Using the identity t/j = 

trace ( P T QEj ) = trace (Q T PEj) we get in a similar way 

V q H(P,Q)=PH\Y) t . (5.49) 

Consequently, the differential equations (5.43) become 

rn 

P = -PH\Q t P) t - E Ai V Q9i(Q), Q = QH\Q t P) • (5.50) 

The product rule Y = Q T P + Q T P for Y = Q T P thus yields 

m 

Y = H'(Y) t Y — YH'{Y) T — E A * Q T ^Q9i(Q)- (5.51) 

i=l 

7 We use again the same letter for different functions. Since they have either one or two 
arguments, no confusion should arise. 
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For X G 0, we now exploit the properties 

(Q T 'V Q g i (Q),X) = (V Q gi(Q),QX) = 0 (because QX e TqG) 

([H'(Y) t ,Y],X ) = trace ( y (Y T H'(Y) - H'(Y)Y T )X) 

= trace (Y T (H'(Y)X - XH'(Y))) = (Y,[H'(Y),X}). 

Since Y(t) G 0* for all t , this gives the equation (5.37). □ 

Reduced System and Reconstruction. Combining Theorems 5.8 and 5.5, we have 
reduced the Hamiltonian system (5.43) to the Lie-Poisson system for y(t) G R d , 

y = B(y)XH(y ), (5.52) 

of half the dimension. To recover the full solution (P(£), Q(t)) G T*G, we must 
solve this system along with 

Q = QH'(Y) , P = Q~ t Y (5.53) 


where Y = ]T^ =1 yj Fj £ g*. 

Poisson Structures. The Poisson bracket on R d defined by B(y) is still closely 
related to the canonical Poisson bracket on R 2ra . Consider left-invariant real-valued 
functions K. L on T*G. These can be viewed as functions on T*G/G = 0* C 

jj^nxn 

K(P, Q) = K(Y) for Y = Q T P. 

(As we did previously in this section, we use the same symbol for these functions.) 
Via the projection 77 : R nxn —► g* used in the above proof, we can extend 
K{Y) = K(IIY) to arbitrary n x n matrices Y, and via the above relation to a 
left-invariant function K(P,Q) on M nxn x M nxn , on which we have the canonical 
Poisson bracket 


{K,L} 

can 


£( 


dK dL 

dQki dPki 


dK dL \ 
dPki dQki / 


On the other hand, we can view K as a function on by choosing coordinates 
on 0*, 

d 

K(y) = K(Y) for y = 

J = 1 

On we have the Poisson bracket defined by the structure matrix B(y), 


{77,7} 


d 


E 


dK dL 
dyi 13 d yj ' 
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Lemma 5.9. For left-invariant functions K , L as described above, we have for 
Q T P = Y = ^ d j=1 y j F j eg* 

{K,L}{y) = (Y,[L\Y),K\Y)\) = {K, L} can (P,Q) 

where K'(Y) = j:t 1 ^ i ( y )E i GQ. 

Proof The first equality follows from the identity 

b tJ (y) = (Y,[E ] ,E l }), 


which is a direct consequence of the definition (5.35) with (5.36). For the second 
equality, the relations (5.48) and (5.49) for K and L yield 


{■ K,L} can (P,Q) 


which is the result. 


trace (K'(Y)Y T L'(Y) - K\Y) T YL'{Y) T ) 
trace {K'(Y)Y T L'(Y) - L\Y)Y T K\Y )) 
trace (Y t [L'(Y), K’(Y)]) = (Y, [L 1 (Y), K 1 (F)]) , 

□ 


Discrete Lie-Poisson Reduction. Consider a symplectic integrator 

(P 1 ,Qi) = ^(P 0 ,Q 0 ) on T*G 

for the left-invariant Hamiltonian system (5.43), and suppose that the method pre¬ 
serves the left-invariance: if ^^(Po, Qo) = (Pi, Qi), then 

& h (U T Po, U~ 1 Qo) = (U T P U U-'Q!) for all U £ G. (5.54) 

For example, this is satisfied by the RATTLE algorithm. The method then induces a 
one-step map 

n = Wo) on 0* 

by setting Yi = QjPi for (Pi,Qi) = $h(Po,Qo) with QqPo = Y 0 . This is a 
numerical integrator for (5.37), and in the coordinates y = (; yj ) with respect to the 
basis ( Fj) of 0* this gives a map 

Vi = ^h(y o) on R d , 

which is a numerical integrator for the Poisson system (5.52). 

Example 5.10. For the rigid body, applying the RATTLE algorithm to the con¬ 
strained Hamiltonian system (5.18) yields the integrator for the Euler equations 
discussed in Sect. VII.5.3. By the following result this is a Poisson integrator. 

Theorem 5.11. If<Ph(P, Q ) is a symplectic and left-invariant integrator for (5.43), 
then its reduction f>h(y) Is a Poisson map. 
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Proof. We write f>h as the composition 

ip h : K d —U T*G T*G R d 

where ij = (rjj) is the function with rjj(P, Q ) = yj for Q T P = Y^j=i VjFj, an d £ 
is any right inverse of rj, i.e., rj o £ = id. For arbitrary smooth real-valued functions 
K , L on R d we then have for (P, Q) = £(?/), using Lemma 5.9 in the outer equalities 
and the symplecticity of <T>h in the middle equality, 

{K o^,Lo ^h}(y) = {P o rj o <T h , L o rj o ^}can (P, Q) 

= {Ko^Lo4 an (^(P,Q)) ={if,L}(^(2/)). 

This equation states that f>h is a Poisson map. □ 

A similar reduction in a discrete Lagrangian framework is studied by Marsden, 
Pekarsky & Shkoller (1999). 

The reduced numerical maps f>h and P} h have further structure-preserving prop¬ 
erties: they preserve the Casimirs and the co-adjoint orbits. This will be shown in 
Sect. IX.5.3 with the help of backward error analysis. 


VII.6 Reduced Models of Quantum Dynamics 

To incorporate quantum effects in molecular dynamics simulations, computations 
are done with models that are intermediate between classical molecular dynam¬ 
ics based on Newton’s equations of motion and full quantum dynamics described 
by the 7V-particle Schrodinger equation. The direct computational treatment of the 
latter is not feasible because of its high dimensionality. These intermediate mod¬ 
els are obtained by the Hamiltonian reduction (2.17) from an infinite-dimensional 
Hilbert space to an appropriately chosen manifold. In chemical physics, this re¬ 
duction is known as the Dirac-Frenkel time-dependent variational principle. We 
illustrate this procedure for the case where the quantum-mechanical wave function 
is approximated by a complex Gaussian as proposed by Heller (1975). It turns out 
that the resulting ordinary differential equations have a Poisson structure, which 
was recently described by Faou & Lubich (2004). Following that paper, we derive 
a structure-preserving explicit integrator for Gaussian wavepackets, which tends to 
the Stormer-Verlet method in the classical limit. 

VII.6.1 Hamiltonian Structure of the Schrodinger Equation 

The introduction of wave mechanics stands ... as Schrodinger’s monu¬ 
ment and a worth one. (From Schrodinger’s obituary 

in The Times 1961; quoted from http://www-groups.dcs.st-and.ac.uk/ his- 
tory/Mathematicians/Schrodinger.html) 
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The time-dependent 7V-body Schrodinger equation reads (see, e.g., Messiah (1999) 
and Thaller (2000)) 

Hip ( 6 . 1 ) 

for the wave function ^ depending on the spatial variables x = 

(xi ,..., xn) with Xk G R d (e.g., with d = 1 or 3 in the partition) and the time 
t G R. The squared absolute value \^(x, t)\ 2 represents the joint probability density 
for N particles to be at the positions x\ ,..., x^ at time t. In (6.1), 5 is a (small) pos¬ 
itive number representing the scaled Planck constant and i is the complex imaginary 
unit. The Hamiltonian operator H is written 

H = T + V 


with the kinetic and potential energy operators 



where rrik > 0 is a particle mass and A Xk the Laplacian in the variable Xk G M d , 
and where the real-valued potential V acts as a multiplication operator {V(j)){x) = 
V(x)cj){x). Under appropriate conditions on V (boundedness of V is sufficient, 
but by no means necessary), the operator H is then a self-adjoint operator on 
the complex Hilbert space L 2 ( M dAr ,C) with domain D(H) = D(T ) = {<fi e 
L 2 (R dN ,C ); Tcj) e L 2 (R dN X )}; see Sect. V.5.3 of Kato (1980). 

We separate the real and imaginary parts of = v + iw £ L 2 (R dN , C), the 
complex Hilbert space of Lebesgue square-integrable functions. The functions v and 
w are thus functions in the real Hilbert space L 2 ( R dN , R) . We denote the complex 
inner product by (•, •) and the real inner product by (•,•). The L 2 norms will be 
simply denoted by || • ||. 

As H is a real operator, formula (6.1) can be written 


ev = Hw , 
ew = Hv. 


( 6 . 2 ) 


or equivalently, with the canonical structure matrix 



and the Hamiltonian function (we use the same symbol H as for the operator) 

H(v,w) = = \{v,Hv) + 

for ^ = v + iw in the domain of the operator H. This becomes the canonical 
Hamiltonian system 
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Note that the real multiplication with J corresponds to the complex multiplication 
with the imaginary unit i. The flow of this system preserves the canonical symplectic 
two-form 

^(6,6) = (^ 1,6), Ci ,6ei 2 (i",i) 2 . ( 6 . 3 ) 

VII.6.2 The Dirac-Frenkel Variational Principle 

For dealing with atoms involving many electrons the accurate quantum 
theory, involving a solution of the wave equation in many-dimensional 
space, is far too complicated to be practicable. One must therefore resort 
to approximate methods. (RA.M. Dirac 1930) 


Reduced models of the Schrodinger equation are obtained by restricting the equation 
to an approximation manifold Ad via (2.17), viz., 


o sJii-VH(u ), 0 = 0 

for all 

^ € T U M, 

(6.4) 

or equivalently in complex notation for u = 

(v, w) T - 

= v + iw , 


Re ( eiii — Hu, £ ) = 0 

for all 

e e T U M. 

(6.5) 


Taking the real part can be omitted if the tangent space T U M is complex linear. 
Equation (6.5) (usually without the real part) is known as the Dirac-Frenkel time- 
dependent variational principle, after Dirac (1930) and Frenkel (1934); see also 
McFachlan (1964), Heller (1976), Beck, Jackie, Worth & Meyer (2000), and ref¬ 
erences therein. 

We choose a (local) coordinate map u = x{y) 0 f Ad and denote its derivative 
X c (y) = V(y) + iW(y) = x'{y) or i n the real setting as X = ^ ^. Denoting 

by X T the adjoint of X with respect to the real inner product (•,•), we thus obtain 

eX(y) T JX(y) y = X(y) T X u H( X (y)). 

With denoting the adjoint of Xc with respect to the complex inner product (•,•), 
we note X*X C = (V T V + W T W ) + i(V T W - W T V) = X T X - iX T JX and 
hence 

X T JX = -ImX^X c . (6.6) 

Lemma 6.1. If T U M. is a complex linear space for every u E Ad, then Ad is a 
symplectic submanifold of L 2 (M Ar , M) 2 , that is, the symplectic two-form (6.3) is 
non-degenerate on T u Mfor all u E Ad. Expressed in coordinates, 

X(y) T JX(y) is invertible for all y. 

Proof We fix u = x{y) £ Ad and omit the argument y in the following. Since 
T U M = Range(Xc) is complex linear by assumption, there exists a real linear 
mapping L : M m —► M m such that iXcy = XcLp for all p E M m . This implies 

JX = XL and L 2 = -Id 

and hence X T JX = X T XL , which is invertible. □ 
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Approximation properties of the Dirac-Frenkel variational principle can be ob¬ 
tained from the interpretation as the orthogonal projection u = P± (it) P Pit, which 
corresponds to taking the imaginary part in (6.5), as opposed to the symplectic pro¬ 
jection in (6.4) which corresponds to the real part. See Lubich (2005) for a near¬ 
optimality result for approximation on the manifold. 


VII.6.3 Gaussian Wavepacket Dynamics 

We develop a new approach to semiclassical dynamics which exploits the 
fact that extended wavefunctions for heavy particles (or particles in har¬ 
monic potentials) may be decomposed into time-dependent wave pack¬ 
ets, which spread minimally and which execute classical or nearly classi¬ 
cal trajectories. A Gaussian form for the wave packets is assumed and 
equations of motion are derived for the parameters characterizing the 
Gaussian. (E.J. Heller 1975) 

The variational Gaussian wavepacket dynamics of Heller (1976) is obtained by 
choosing the manifold M in (6.5) as consisting of complex Gaussians. For ease 
of presentation we restrict our attention in the following to the one-particle case 
N = 1 (the extension to N > 1 is straightforward; cf. Heller (1976) and Faou & 
Lubich (2004)). Here we have 


M = {u = x(y) e L 2 (R d , C) : y = (p,q,a, /3,j,8) e R 2d+4 with (3 > 0} 

(6.7) 


with 


(x( 2 /))(a0 = expQ ((a + i/3) \x - q\ 2 + p ■ (x - q) + 7 + id )), ( 6 . 8 ) 

where | • | and • stand for the Euclidean norm and inner product on R d . The pa¬ 
rameters q and p represent the average position and momentum, respectively: for 
it = x(y) with V — (Pi Qi Pi 7 , 8) and \\u\\ = 1 , a direct calculation shows that 

q=(u,xu) = / x \u(x)\ 2 dx , p=(u,—ieVu). 

JR d 

The parameter /3 > 0 determines the width of the wavepacket. The tangent space 
T U M C L 2 (M d , C) at a given point u = x(y) £ M is (2 d + 4)-dimensional and is 
made of the elements of L 2 (IR d , C) written as 


l - ((A + iB) \x - q\ 2 + (P - 2 (a + i/3)Q) •( x-q)-p-Q + C + iD^ju (6.9) 

with arbitrary (P, Q, A, B, C, D) T G M 2d+4 . We note that T U M is complex linear, 
and u G T U M. By choosing £ = iu in (6.5), this yields ( d/dt ) ||it || 2 = 2 Re (ti, u) = 
0 and hence the preservation of the squared L 2 norm of it = x(y), which is given 
by 
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I(y) = IMI 2 = [ \u(x)\ 2 dx ( 6 . 10 ) 

jR d 



The physically reasonable situation is \\u\\ 2 = 1, which corresponds to the interpre¬ 
tation of | u(x) | 2 as a probability density. 

With these preparations we have the following result of Faou & Lubich (2004). 

Theorem 6.2. The Hamiltonian reduction of the Schrodinger equation to the Gaussian 
wavepacket manifold A4 of(6.7)-(6.8) yields the Poisson system 


y = B(y)XK(y ) (6.11) 

where, for y = (p, g, a, /?, 7 , (5) E M 2d+4 with f3 > 0 , tmd with 1 d denoting the 
d-dimensional identity, 


B(y) = 


m 


0 

0 

T 

p 

V 0 


-i d 

0 

0 

0 

0 

0 


0 

0 

0 

4(3 2 

ed 

0 

p 


0 

0 

4 P 2 
sd 

0 

-p 

0 


2' 

-p 

0 

d-\- 2 

4 1 

0 / 


-P 

0 

0 

/? 

0 

_ <^+2 - 

4 & 


defines a Poisson structure, and for u = %(?/), 

K(t/) = (ti,i7u) = K T (t/) + K y (t/) 

A the total energy, with kinetic and potential parts 

ts t \ T ( \ f\P \ 2 , ed a 2 +P 2 \ , s 

K Ay) = Hv) ^ -g- j = (»^»> 


( 6 . 12 ) 


(6.13) 


and 

Kv{y) = J V(x) exp (^—-(/3\x — q\ 2 + 5)^ dx = (u,Vu). 

Both K(y) and I(y) are first integrals of the system. 

Proof As in (2.22), the differential equation for y is sX(y) T JX(y)y = i VK(y ). 
We note (6.6) and 


Xc(y) = ^(x-q, -2a(x -q)-p, \x - q\ 2 , i|a; - <?| 2 , 1 , i) u 

where a = a + i[3 and u = x(y) in the complex setting. Using the calculus of 
Gaussian integrals, we compute 
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eX T (y)JX(y ) 



0 

Id 

0 

0 

0 

0 \ 

-1 d 

0 

0 

dp 

2(3 

0 

2p 

£ 

0 

0 

0 

ed(d+2) 
8(3 2 

0 

d 

2(3 

0 

dp T 

ed(d+2) 

0 

d 

0 

2(3 

8 (3‘ 2 

2(3 

0 

0 

0 

d 

2(3 

0 

_ 2 
£ 

0 

2 p T 
£ 

d 

2(3 

0 

2 

£ 

0 / 


and inversion yields the differential equation with B(y) = (2 sX T (y)JX(y)) 1 as 
stated. The system is a Poisson system by Theorem 2.8. □ 


Assuming I(y) = ||it|| 2 = 1, we observe that the differential equations for the 
average position and momentum, q and p , read 


q=p/m , p=—(u,\7Vu) (6.14) 

for u = x(y) an d V = (p, q, a, /?,7, 5). We then note (u,Wu) —> W(q) as 

£ ^ 0. The differential equations for q and p thus tend to Newtonian equations of 
motion in the classical limit £ —> 0 : 


q=p/m , p=—WV(q). (6.15) 

It will be useful to consider also scaled variables 

V — (p? 9S a,A7,^) £ M 2d+4 with /3 = —, 5 = -. (6.16) 

Here we have 

y = B(y)\7K(y) (6.17) 

where the structure matrix B(y) is independent of £, and where 7f(p) depends reg¬ 
ularly on £ > 0. 


VII.6.4 A Splitting Integrator for Gaussian Wavepackets 

With the natural splitting H = T + V into kinetic and potential energy, we now 
consider the variational splitting integrator (4.7) - (4.8), which here becomes the 
following. 

1. We define u+ in M as the solution at time h/ 2 of the equation for u, 

(ieu-Vu,£) = 0 for all £ e T U M (6.18) 

with initial value u(0) = u n E M. 

2. We define u ~ +1 as the solution at time h of 

(ieii — Tu,£)=0 for all £ G T U M (6.19) 

with initial value it(0) = u+. 

3. Then u n + 1 is the solution at time h /2 of (6.18) with initial value it(0) = u~ +1 . 
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By Theorem 6.2, the substeps in the definition of this splitting method written in the 
coordinates y = (p, q, a, /?, 7 , S) are the exact flows tph /2 an< ^ ^ °f P°i sson 
systems 

y = B(y)VK v (y) and y = B(y)VK T (y). 

Note that both equations preserve the L 2 norm of a = \(y), which we assume to 
be 1 in the following. 

Most remarkably, these equations can be solved explicitly. Let us consider first 
the equations (6.19). They are written, for a = a + i/3 and c = 7 + iS, as 

q = p/m, 

p = 0 , 


a = —2 a 2 /m, 

c = (\\p\ 2 + ieda)/m, 


( 6 . 20 ) 


with initial values yo = (po, Qo, a o, Co) corresponding to uo = x(Vo)- They have the 
solution 


q(t) = qo H-Po, 

m 


p(t)=p 0 , a(t) = — 


a 0 


2aot/r 


and 


. , bol 2 . ied f 2 a 0 t\ 

C (t) = e 0 + * — + — log f i + —) 


Let us now consider the equations (6.18). Taking into account the fact that the po¬ 
tential V is real, these equations are written 


p = —(u, Wu), 

4 = 0 , 


a = ~^{u,AVu), 

/3 = 0 , 

( 6 . 21 ) 

7 = ~{u,Vu) + £p{u,AVu), 

j = 0, 



with the L 2 inner products 

(u,Wu) = J W(x) exp ^ — - (/3\x — q\ 2 + 5)^ dx ( 6 . 22 ) 


for W » V, VV, AV. As the L 2 inner products in the equations for p, a, 7 depend 
only on q 9 (3, 5 which are constant along this trajectory, these equations can be solved 
trivially, requiring only the computation of the inner products at the initial value. We 
thus see that the splitting scheme <&h = <p ^ 2 0( Ph 0( ^h /2 can computed explicitly. 
This gives the following algorithm (Faou & Lubich 2004). 


Algorithm 6.3 (Gaussian Wavepacket Integrator). A step from time t n to t n + 1 , 

starting from the Gaussian wavepacket u n = x(p n , ^m Pm 1 m&n)> proceeds as 

follows: 

1. With ( W) n = ( u n , Wu n ) given by (6.22) for W = V, VV, AV, compute 
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Pn+1/2 = Pn~ ^ ( V V )n 

a+ = a n -T{AV) n (6.23) 

i h £ , . . 

+ m {AV) - 

2. From the values p n +i/ 2 , a n = a n + iftn c+ = 7 + + compute q n +i, 

a n+l = «n+l + iPn+ 1, ^ C~ +1 = 7~ +1 + i^ n +l VIA 

h 

Qn+l ~ Qn H-Pn+1/2 

m ' 

a n+l = a n / [} + 2 ^ a n) (6-24) 

+ ied 1 / 2h ±\ 

C n +1 = C n + ^-!°g0+ — «nj- 

3. Compute p n+ i,a n+1 ,^ n+ i from 

h 

Pn+l = Pn+1/2 — ^ (VV/ n +i 

*^n+l ^n+1 (^^)n+l (6.25) 

Vn+1 Vn+1 16/5 _|_i 

Let us collect properties of this algorithm. 

Theorem 6.4. The splitting scheme of Algorithm 6.3 is an explicit, symmetric, 
second-order numerical method for Gaussian wavepacket dynamics (6.11)-(6.13). 
It is a Poisson integrator for the structure matrix (6.12), and it preserves the unit 
L 2 norm of the wavepackets: \\u n \\ = 1 for all n. 

In the limit £ —f 0, the position and momentum approximations q n , p n of this 
method tend to those obtained by applying the Stormer-Verlet method to the asso¬ 
ciated classical mechanical system (6.15). 

The statement for 6 —> 0 follows directly from the equations for p n+ 1 / 2 , q n + 1 , 
p n + 1 and from noting (W)„ VV(q n ). 

In view of the small parameter 5 , the discussion of the order of the method 
requires more care. Here it is useful to consider the integrator in the scaled variables 
y = (p, g, rt, /?/er, 7, 5/e) of (6.16). Since the differential equation (6.17) contains 5 
only as a regular perturbation parameter, after n steps of the splitting integrator we 
have the e-uniform error bound 

Vn ~ V(t n ) = 0(h 2 ), 

where the constants symbolized by the O -notation are independent of e and of n and 
h with nh < Const. For the approximation of the absolute values of the Gaussian 
wavepackets this yields 
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|||«n| 2 -K*n)| 2 ||=0(* 2 ), (6-26) 

but the approximation of the phases is only such that 

\\u n -u(t n )\\=0(h 2 /e). (6.27) 

We refer to Faou & Lubich (2004) for the formulation of the corresponding algo¬ 
rithm for TV > 1 particles, for further properties such as the exact conservation 
of linear and angular momentum and the long-time near-conservation of the total 
energy (u n , Hu n ), and for numerical experiments. 


VII.7 Exercises 


1. Prove that the Poisson bracket (2.8) satisfies the Jacobi identity (2.4) for all 
functions F, G, H, if and only if it satisfies (2.4) for the coordinate functions 

Vi i Vj 5 Uk • 

Hint (F. Engel, in Lie’s Gesammelte Abh. vol. 5, p. 753). If the Jacobi identity is 
written as in (3.3), we see that there are no second partial derivatives of F (the 
left hand side is a Lie bracket, the right-hand side has no second derivatives of 
F anyway). Other permutations show the same result for G and H. 

2. For x in an open subset of M m , let A{x) = (a^ (x)) be an invertible skew- 
symmetric m x m- matrix, with 


dCLij ^ dciki ^ dcijk _ q 

dxk dxj dxi 


for all i,j,k. 


(7.1) 


(a) Show that B(x) = A{x)~ x satisfies (2.10) and hence defines a Poisson 
bracket. 

(b) Generalize Theorem 2.8 to Hamiltonian equations (2.18) with the two-form 

w*(£i,6) =£'[A(x)&. 

Remark. Condition (7.1) says that uj is a closed differential form. 

3. Solve the following first order partial differential equation: 

0 OF dF dF 

3 j, -b 2 —-5 —— = 0. 

oy\ oy2 oy 3 


Result. f(2yi - 3y 2 , 5 y 2 + 2 y 3 ). 

4. Find two solutions of the homogeneous system 

dF dF dF dF dF dF dF 

3 dyi + dy 2 2 dy 3 5 dy 4 2 <h)\ dy 2 3 dy A 

such that their gradients are linearly independent. 

5. Consider a Poisson system y = B(y)VH(y) and a change of coordinates 
z = $(?/). Prove that in the new coordinates the system is of the form 
i = B(z)VK(z ), where B(z) = 'd'{y)B{y) r d r {y) T (cf. formula (3.12)) and 
K(z) = H(y). 
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6. Give an elementary proof of Theorem 4.3. 

Hint. Define S(t) := ip' t (y)B(y)(p' t (y) T — B((p t (y)). Using the variational 
equation for (4.1) prove that S(t) is the solution of a homogeneous linear dif¬ 
ferential equation. Therefore, 5(0) = 0 implies 5(t) = 0 for all t. 

7. Let z = &(y) be a transformation taking the Poisson system y = B(y)X7H(y) 
to z = B(z)VK(z). Prove that @h(y) is a Poisson integrator for B(y) if and 
only if \Ph(z) = d o <& h o i^~ 1 (z) is a Poisson integrator for B(z). 

8. Let fibea skew-symmetric but otherwise arbitrary constant matrix, and con¬ 
sider the Poisson system y = BVH(y). Prove that every symplectic Runge- 
Kutta method is a Poisson integrator for such a system. 

Hint. Transform B to block-diagonal form. 

9. (M.J. Gander 1994). Consider the Lotka-Volterra equation (2.13) with separa¬ 
ble Hamiltonian H(u, v ) = K(u) + L(v). Prove that 

tt-n+l = H - hu n V n H v (u n ^ U n ), V n -\-l — V n hu n J r \V n H u (u n J r \,V n ) 

is a Poisson integrator for this system. 

10. Find a change of coordinates that transforms the Lotka-Volterra system (2.14) 
into a Hamiltonian system (in canonical form). Following the approach of Ex¬ 
ample 4.11 construct Poisson integrators for this system. 

11. Prove that the matrix B(y) of Example 2.7 defines a Poisson bracket, by show¬ 
ing that the bracket is given as Dirac’s bracket (Dirac 1950) 

{F, G) = {F, G} - Ci}lij{cj,G}. (7.2) 

Here F and G are functions of y, F and G are smooth functions of x satisfying 
F(x(y)) = F(y) an d G(x(y)) = G(y), Ci(x) are the constraint functions 
defining the manifold A4, and Xij are the entries of the inverse of the matrix 
({ci,Cj}). The Poisson bracket to the left in (7.2) corresponds to B(y) and 
those to the right are the canonical brackets evaluated at x = x(y)- Replacing 
F(x) by F(x) + /jLk(x)ck(x) with fik(x) such that {F, c k } = 0 on A4 
eliminates the sum in (7.2) and proves the Jacobi identity for B(y). 



Chapter VIII. 

Structure-Preserving Implementation 


This chapter is devoted to practical aspects of an implementation of geometric inte¬ 
grators. We explain strategies for changing the step size which do not deteriorate the 
correct qualitative behaviour of the solution. We study multiple time stepping strate¬ 
gies, the effect of round-off in long-time integrations, and the efficient solution of 
nonlinear systems arising in implicit integration schemes. 


VIII. 1 Dangers of Using Standard Step Size Control 

Another possible shortcoming of the method concerns its behavior when 
used with a variable step size ... The integrator completely loses its desir¬ 
able qualities ... This can be understood at least qualitatively by realizing 
that by changing the time step one is in essence continually changing the 
nearby Hamiltonian ... (B. Gladman, M. Duncan & J. Candy 1991) 

In the previous chapters we have studied symmetric and symplectic integrators, and 
we have seen an enormous progress in long-time integrations of various problems. 
Decades ago, a similar enormous progress was the introduction of algorithms with 
automatic step size control. Naively, one would expect that the blind combination 
of both techniques leads to even better performances. We shall see by a numerical 
experiment that this is not the case, a phenomenon observed by Gladman, Duncan 
& Candy (1991) and Calvo & Sanz-Sema (1992). 

We study the long-time behaviour of symplectic methods combined with the 
following standard step size selection strategy (see e.g., Hairer, Nprsett & Wanner 
(1993), Sect. II.4). We assume that an expression err n related to the local error is 
available for the current step computed with step size h n (usually obtained with an 
embedded method). Based on an asymptotic formula err n « Ch r n (for h n —► 0) and 
on the requirement to get an error close to a user supplied tolerance 7b/, we predict 
a new step size by 

/ Tol \ 1 / r 

Kew = 0.85 • h n (-) , (1.1) 

V err n / 

where a safety factor 0.85 is included. We then apply the method with step size 
h n+ 1 = h new . If for the new step err n+ 1 < Tol , the step is accepted and the 
integration is continued. If err n +1 > Tol , it is rejected and recomputed with the step 
size h new obtained from (1.1) with n + 1 instead of n. Similar step size strategies 
are implemented in most codes for solving ordinary differential equations. 
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exact solution 



0 < t < 120 2000 < t < 2120 4000 < t < 4120 


fixed step size, h m 0.065 



steps 1 to 1848 steps 30 769 to 32 617 steps 61 538 to 63 386 



Fig. 1.1. Stormer-Verlet scheme applied with fixed step size (middle) or with the standard 
step size strategy (below) compared to the exact solution (above); solutions are for the interval 
0 < t < 120 (left), for 2000 < t < 2120 (middle), and for 4000 < t < 4120 (right) 



(eccentricity e = 0.6). As a numerical method we take the Stormer-Verlet scheme 
(1.1.17) which is symmetric, symplectic, and of order 2. The fixed step size imple- 
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Fig. 1.2. Study of the error in the Hamiltonian and of the global error for the Stormer-Verlet 
scheme. Fixed step size implementation with h — 10 _3 , variable step size with Tol = 10 -4 


mentation is straightforward. For the variable step size strategy we take for err n the 
Euclidean norm of the difference between the Stormer-Verlet solution and the sym- 
plectic Euler solution (which is available without any further function evaluation). 
Since err n = 0(h ^), we take r = 2 in (1.1). 

The numerical solution in the (#i, # 2 )-plane is presented in Fig. 1.1. To make 
the long-time behaviour of the two implementations visible, we show the numer¬ 
ical solution on three different parts of the integration interval. We have included 
the numbers of steps needed for the integration to reach t = 120, 2120, and 4120, 
respectively. We see that the qualitative behaviour of the variable step size imple¬ 
mentation is not correct, although it is more precise on short intervals. Moreover, 
the near-preservation of the Hamiltonian is lost (see Fig. 1.2) as is the linear error 
growth. Apparently, the error in the Hamiltonian behaves like \a — bt | for the vari¬ 
able step size implementation, and that for the solution like \ct — dt 2 1 (with constants 
a, 6, c, d depending on Tol). Due to the relatively large eccentricity of the problem, 
the variable step size implementation needs fewer function evaluations for a given 
accuracy on a short time interval, but the opposite is true for long-time integrations. 

The aim of the next two sections is to present approaches which permit the 
use of variable step sizes for symmetric or symplectic methods without losing the 
qualitatively correct long-time behaviour. 
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VIII.2 Time Transformations 

A variable step size implementation produces approximations y n on a (non-equi- 
distant) grid {t n }. The same effect can be achieved by performing in advance a 
time transformation t r and by applying a constant step size implementation 
to the transformed system. If the time transformation is given as the solution of a 
differential equation, it follows from the chain rule that the transformed 

system is 

y'= °(y)f(y), t' = a(y). (2.1) 

Here, prime indicates a derivative with respect to r, and we use the same letter y for 
the solutions y(t) of y = f(y ) and y{r) of (2.1). If a(y) > 0, the correspondence 
t <r-> r is bijective. 

Applying a numerical method with constant step size 5 to (2.1) yields approxi¬ 
mations y n ~ y(r n ) = y(t n ), where r n = ns and 

r{n+ l)e 

tn+i-t n - a(y(r)) dr ^ ea(y n ). ( 2 . 2 ) 

J ns 

Approximations to t n are obtained by integrating numerically the differential equa¬ 
tion t' = a(y) together with y' = cr(y)f(y). 

In the context of geometric numerical integration, we are interested in time trans¬ 
formations such that the vector field cr(y)f(y) retains geometric features of f(y). 

VIII.2.1 Symplectic Integration 

For a Hamiltonian system y = f(y) = J~ 1 'VH(y) it is natural to search for step 
size functions a(y) such that (2.1) is again Hamiltonian. For this we have to check 
whether the Jacobian of cr(y)\7H(y) is symmetric (cf. Integrability Lemma VI.2.7). 
But this is the case only if VH(y)Va(y) T is symmetric, i.e., VH(y) and Va(y) 
are collinear, so that ^a(y(t)) = \7a(y(t)) T JVi7(?/(£)) = 0. Consequently, 
a(y) = Const along solutions of the Hamiltonian system which is what makes this 
approach unattractive for a variable step size integration. This disappointing fact has 
been observed by Stoffer (1988, 1995) and Skeel & Gear (1992). 

The main idea for circumventing this difficulty is the following: suppose we 
want to integrate the Hamiltonian system with steps of size h ~ ea(y), where 
a(y) > 0 is a state-dependent given function and 6 > 0 is a small parameter. 
Instead of multiplying the vector field f(y) = J -1 VH(y) by cr(y), we consider the 
new Hamiltonian 

K{y)=a{y)(H(y)-H 0 ), (2.3) 

where Hq = H(yo) for a fixed initial value y$. The corresponding Hamiltonian 
system is 


y' = a(y)J~ 1 VH(y) + (H(y) - H 0 )j~Wa(y). 


(2.4) 
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Compared to (2.1) we have introduced a perturbation, which vanishes along the 
solution of the Hamiltonian system passing through y 0 , but which makes the system 
Hamiltonian. 

Time transformations such as in (2.3) are used in classical mechanics for an an¬ 
alytic treatment of Hamiltonian systems (Levi-Civita (1906, 1920), where (2.3) is 
called the “Darboux-Sundman transformation”, see Sundman (1912)). Zare & Sze- 
behely (1975) consider such time transformations for numerical purposes (without 
taking care of symplecticity). Waldvogel & Spirig (1995) apply the transformations 
proposed by Levi-Civita to Hill’s lunar problem and solve the transformed equations 
by composition methods in order to preserve the symplectic structure. The following 
general procedure was proposed independently by Hairer (1997) and Reich (1999). 

Algorithm 2.1. Apply an arbitrary symplectic one-step method with constant step 
size e to the Hamiltonian system (2.4), augmented by t' = <j(y). This yields numer¬ 
ical approximations (y n , t n ) with y n ~ y(t n ). 

Although this algorithm yields numerical approximations on a non-equidistant 
grid, it can be considered as a fixed step size, symplectic method applied to a differ¬ 
ent Hamiltonian system. This interpretation allows one to apply the standard tech¬ 
niques for the study of its long-time behaviour. 

A disadvantage of this algorithm is that for separable Hamiltonians H (p, q ) = 
T(p) + U(q) the transformed Hamiltonian (2.3) is no longer separable. Hence, meth¬ 
ods that are explicit for separable Hamiltonians are not explicit in the implementa¬ 
tion of Algorithm 2.1. The following examples illustrate that this disadvantage can 
be partially overcome for the important case of Hamiltonian functions 

H(p,q) = ±p T M- 1 p+U(q), (2.5) 

where M is a constant symmetric matrix. 

Example 2.2 (Symplectic Euler with p -Independent Step Size Function). For 

step size functions a(q) the symplectic Euler method, applied with constant step 
size 6 to (2.4), reads 

Pn+ 1 = Pn ~ sa(q n )VU(q n ) - eQ p^ +1 M _1 p n +i + U(q n ) - #o)Vcr(g„) 
Qn+i = Qn + £cr(q n )M~ 1 p n+1 


and yields an approximation at t n +1 = t n + sa(q n ). The first equation is non¬ 
linear (quadratic) in p n +i. Introducing the scalar quantity [3 := ||p n +i|lM := 
p^ +1 M _1 p n+ 1 , it reduces to the scalar quadratic equation 


0 = 


Pn ~ ea(q n )VU(q n ) - e (^ + U(q n ) - H 0 ^j S7a(q n 


2 

M 


which can be solved directly. The numerical solution (p n +i?^n+i) is then given 
explicitly. 
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Choices of Step Size Functions. Sometimes suitable functions cr(p, q) are known 
a priori. For example, for the two-body problem one can take a(p,q) = \\q \\ a , e.g., 
a = 2, or a = 3/2 to preserve the scaling invariance (Budd & Piggott 2003), so 
that smaller step sizes are taken when the two bodies are close. 

An interesting choice, which does not require any a priori knowledge of the 
solution, is a(y) = ||/(p)|| _1 . The solution of (2.1) then satisfies ||i/(t)|| = 1 (arc- 
length parameterization) and we get approximations y n that are nearly equidistant 
in the phase space. Such time transformations have been proposed by McLeod & 
Sanz-Serna (1982) for graphical reasons and by Huang & Leimkuhler (1997). For a 
Hamiltonian system with H (p, q) given by (2.5), it is thus natural to consider 

*(p, q ) = (1 p t m- { p + VUiqfM- 1 V!J(q)) ” 1/2 . (2.6) 

We have chosen this particular norm, because it leaves the expression (2.6) invariant 
with respect to linear coordinate changes q i—> Aq (implying p ^ A~ T p). Ex¬ 
ploiting the fact that the Hamiltonian (2.5) is constant along solutions, the step size 
function ( 2 . 6 ) can be replaced by the p-independent function 

a(q) = ((H 0 - U(q)) + VU(q) T M " 1 W(<z)) . (2.7) 

The use of (2.6) and (2.7) gives nearly identical results, but (2.7) is easier to im¬ 
plement. If we are interested in an output that is approximatively equidistant in the 
g-space, we can take 

a(q) = (H 0 -U(q))~ 1/2 . (2.8) 

Example 2.3 (Stormer-Verlet Scheme with p -Independent Step Size Function). 

For a step size function a(q) the Stormer-Verlet scheme gives 

Pn+ 1/2 = Pn~ ^<r(qn)VU(q n ) ~ | (-ff(p„+l/ 2 , Qn) - #o) Vcr(<? ra ) 

Qn+1 = «n + |(o-(g„)+cr(g ra+ i))M _ 1 p ra +i /2 (2-9) 

Pn +1 = Pn • 1/2 p('/» • l)Vr( 7 /„ + J ) 

-^( K H(p n+ i/ 2 ,q n +i) - #o)Vcr(g n+ i). 

The first equation is essentially the same as that for the symplectic Euler method, 
and it can be solved for p n+1 / 2 as explained in Example 2.2. The second equation 
is implicit in q n+ 1 , but it is sufficient to solve the scalar equation 

7 = °{q n + |( 0 '(<?n) +l)M~ 1 p n+ i/^ ( 2 . 10 ) 

for 7 = cr(g n+ i). Newton iterations can be efficiently applied, because Vcr(g) is 
available already. The last equation (for p n +i) is explicit. This variable step size 
Stormer-Verlet scheme gives approximations at t n , where 

t n +i =t n + |(o-(g n ) + <?(q n + 1 ))- 
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: 

: r 

constant step size 


. •* •* 

_ ••• 

\ 

• ; • • • : : 
• • • 

• • 

• • 

t 

t 

. 

j^ # ,»»****».* 

# 

# 

i. 

A, 

1. 


, , J/ | , , 

• 

; -i 

• 

1 

/ 

* 

•• 

: -i 

• 

•• 

* 

% 

* 

% 

V 

• 

•• 

• • 

• 

# 

54 steps * 


•• 

•• • 

73 steps -_ 1 


. 

197 steps H ^~* ,,,# _| 



Fig. 2.1. Various step size strategies for the Stormer-Verlet scheme (Example 2.3) applied to 
the perturbed Kepler problem (1.2) on the interval [0,10] (approximately two periods) 


In Fig. 2.1 we illustrate how the different step size functions influence the posi¬ 
tion of the output points. We apply the Stormer-Verlet method of Example 2.3 to the 
perturbed Kepler problem (1.2) with initial values, perturbation parameter, and ec¬ 
centricity as in Sect. VIII. 1. As step size functions we use (2.7), (2.8), and constant 
step size a(q) = 1. For all three choices of a(q) we have adjusted the parameter 5 in 
such a way that the maximal error in the Hamiltonian is close to 0.01. The step size 
strategy (2.7) is apparently the most efficient one. For this strategy, we observe that 
the output points in the g-plane concentrate in regions where the velocity is large, 
while the constant step size implementation shows the opposite behaviour. 

VIII.2.2 Reversible Integration 

For p-reversible differential equations y = f(y), i.e., f(py) = —pf{y) for all y , the 
time transformed problem ( 2 . 1 ) remains p-reversible if 

a(py) = a (y). (2.11) 

This condition is not very restrictive and is satisfied by many important time trans¬ 
formations. In particular, (2.11) holds for the arc length parameterization cr(y) = 
||/(t /)|| -1 if p is orthogonal. Consequently, it makes sense to apply symmetric, re¬ 
versible numerical methods with constant step size £ directly to the system ( 2 . 1 ). 

However, similar to the symplectic integration of Sect. VIII.2.1, there is a serious 
disadvantage. For separable differential equations (i.e., problems that can be split as 
P = /i(g)’ Q = h{p)) and for non-constant cr(p, q) the transformed system ( 2 . 1 ) 
is no longer separable. Hence, methods that are explicit for separable problems are 
not necessarily explicit for ( 2 . 1 ). 

Example 2.4 (Adaptive Stormer-Verlet Method). We consider a Hamiltonian 
system with separable Hamiltonian (2.5), and we apply the Stormer-Verlet scheme 
to (2.1). This yields (Huang & Leimkuhler 1997) 

Pn+ 1/2 = Pn~ |<SnW(g n ) 

Qn+1 = Qn + ^ ( 5n + S n+l)M 1 Pn+ 1/2 ( 2 . 12 ) 

Pn+1 = Pn+ 1/2 — 2 5 n+l V(7((/ n +i), 
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where s n = <r(p n +i/ 2 , q n ) and s n+ i = cr(p n+:L / 2 , q n +i) (notice that the s n+1 of 
the current step is not the same as the s n of the subsequent step, if cr(p, q ) depends 
on p). The values (p n+ i, q n +i) are approximations to the solution at t n , where 

tn+1 = tn + 2 { S n + 5 n+l) • 

For a p-independent step size function 5 , method (2.12) corresponds to that of Ex¬ 
ample 2.3, where the terms involving Vcr(q) are removed. The implicitness of (2.12) 
is comparable to that of the method of Example 2.3. Completely explicit variants of 
this method will be discussed in the next section. 

We conclude this section with a brief comparison of the variable step size 
Stormer-Verlet methods of Examples 2.3 and 2.4. Method (2.12) is easier to im¬ 
plement and more efficient when the step size function cr(p, q ) is expensive to eval¬ 
uate. In a few numerical comparisons we observed, however, that the error in the 
Hamiltonian and in the solution is in general larger for method (2.12), and that 
the method (2.9) becomes competitive when cr(p, q) is p-independent and easy to 
evaluate. A similar observation in favour of method (2.9) has been made by Calvo, 
Lopez-Marcos & Sanz-Serna (1998). 


VIII.3 Structure-Preserving Step Size Control 

The disappointing long-time behaviour in Fig. 1.1 of the variable step size imple¬ 
mentation of the Stormer-Verlet scheme is due to lack of reversibility. Indeed, for a 
p-reversible differential equation the step size h n+ 1/ 2 taken for stepping from y n to 
y n + 1 should be the same as that when stepping from py n + 1 to py n (cf. Fig. V.1.1). 
The strategy of Sect. VIII. 1, for which the step size depends on information of the 
preceding step, cannot guarantee such a property. 

VIII.3.1 Proportional, Reversible Controllers 

Following a suggestion of Stoffer (1988) we consider step sizes depending only 
on information of the present step, i.e., being proportional to some function of the 
actual state. This leads to the algorithm 

2/n+l ~ ^/i n _pi / 2 (yn)i hn+l/2 = £ s (iJn 5 5 (3-1) 

where $h(y) is a one-step method for y = f(y ), and 6 is a small parameter. For 
theoretical investigations it is useful to consider the mapping 

MV) ~ $es(y,e){y)- ( 3 ' 2 > 

This is a one-step discretization, consistent with y' = s(y , 0 )f(y), and applied with 
constant step size 5. Consequently, all results concerning the long-time integration 
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with constant steps (e.g., backward error analysis of Chap. IX), and the definitions 
of symmetry and reversibility can be extended in a straightforward way. 

Symmetry. We call the algorithm (3.1) symmetric, if \P £ (y) is symmetric, i.e., 
\I/ £ = Vll . In the case of a symmetric $h this is equivalent to 

s{y,-e) = s(y,e) with y'&$ es (y, £ ){y)- (3.3) 

Reversibility. The algorithm (3.1) is called p-reversible if, when applied to a p- 
reversible differential equation, \P £ (y) is p-reversible, i.e., p o \P £ = o p (cf. 
Definition V.1.2). If the method <3>h is p-reversible then this is equivalent to 

s{p~ 1 y,s) = s(y,s) with y = & 6 s( y,e)(y)- (3-4) 

Example 3.1. Aiming at step sizes h « sa(y) (cf. (2.2)), Hut, Makino & McMillan 
(1995) propose the use of s(y, e) = \ (cr(y) +cr(y )) where, as in Sect. VIII.2, a(y) 
is some function that uses an a priori knowledge of the solution of the differential 
equation. Notice that, because of y = @ £S (y,£)(,y)i the value of s(y,e) is defined 
by an implicit relation. Condition (3.3) is satisfied whenever $h(y) is symmetric, 
and (3.4) is satisfied whenever $h(y) is p-reversible and cr(py) = a(y) holds. For a 
proof of these statements one shows that s(y, —e) and s(y, s ) (resp. s(p~ 1 y , e) and 
s(y,e)) are solution of the same nonlinear equation. 

How can we find suitable step size functions s(y, e) which satisfy all these prop¬ 
erties, and which do not require any a priori knowledge of the solution? In a re¬ 
markable publication, Stoffer (1995) gives the key to the answer of this question. 
He simply proposes to choose the step size h in such a way that the local error esti¬ 
mate satisfies err = Tol (in contrast to err < Tol for the standard strategy). Let us 
explain this idea in some more detail for Runge-Kutta methods. 

Example 3.2 (Symmetric, Variable Step Size Runge-Kutta Methods). For the 

numerical solution of y = f(y) we consider Runge-Kutta methods 

s s 

Yi = y n + h a,ijf(Yj), y n +% = y n + h ^ (3.5) 

j=1 i=l 

with coefficients satisfying a s+ i_i ?s+ i_j + = bj for all i,j. Such methods are 

symmetric and reversible (cf. Theorem V.2.3). A common approach for step size 
control is to consider an embedded method y n +1 = y n + h J2i=i (which 

has the same internal stages Yi) and to take the difference y n +i ~ Vn+ 1 , i*e., 

D(y n ,h) = h^e i f(Y i ) (3.6) 

i=1 

with ei = bi~bi , as indicator of the local error. For methods where Yi « y(t n + Cih) 
(e.g., collocation or discontinuous collocation) one usually computes the coeffi¬ 
cients ei from a nontrivial solution of the homogeneous linear system 
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= 0 for k = l,...,s-l. (3.7) 

This yields D(y n , /i) = 0(h r ) with r close to 5. According to the suggestion of 
Stoffer (1995) we determine the step size h n + 1 / 2 such that 

\\D(y n ,h n + 1/2 )\\ = Tol. (3.8) 

A Taylor expansion around h = 0 shows that D(y, h ) = d r (y)h r + 0(h rJrl ) with 
some r > 1. We assume ||d r (?/)|| ^ 0 and we put 6 = Tol 1 / r , so that h n + 1/ 2 from 
(3.8) can be expressed by a smooth function s(y,e) as (3.1). 

To satisfy the symmetry relation (3.3) we determine the such that 

e s+ i -i = ei for alH or e s +i-i = — for all i (3.9) 

(Hairer & Stoffer 1997). If the Runge-Kutta method is symmetric, this then implies 

\\D(y n ,h)\\ = \\D(y n+1 ,-h)\\ with y n+1 =$ h (y n ). (3.10) 

This follows from the fact that the internal stage vectors Y; t of the step from y n to 
y n +i and the stage vectors Y\ of the step from y n +1 to y n (negative step size —h) 
are related by Yi = Y s+ 1 _^. The step size determined by (3.8) is thus the same for 
both steps and, consequently, condition (3.3) holds. 

The reversibility requirement (3.4) is a consequence of 

\\D(y n ,h)\\ = \\D(p~ 1 y n+ i,h)\\ with y n +i = & h {y n ) (3.11) 

which is satisfied for orthogonal mappings p (i.e., p T p = I). This is seen as follows: 
applying <&h to p -1 7/ n+ i gives p~ 1 y n , and the internal stages are Yi = p~ 1 Y s ^i-i. 
Hence, we have from (3.9) that D(p~ 1 y n + 1 , h ) = ±p~ 1 D(y n , h), and (3.11) fol¬ 
lows from the orthogonality of p. 

A simple special case is the trapezoidal rule 

Vn+l =Vn + ~^ (f{Vn) + f{Vn + l)) (3-12) 

combined with 

D(y n , h ) = \ (fiVn+l) ~ f{Vn )) • 

The scalar nonlinear equation (3.8) for h n + 1/2 can be solved in tandem with the 
nonlinear system (3.12). 

Example 3.3 (Symmetric, Variable Step Size Stormer-Verlet Scheme). The 

strategy of Example 3.2 can be extended in a straightforward way to partitioned 
Runge-Kutta methods. For example, for the second order symmetric Stormer-Verlet 
scheme (1.1.17), applied to the problem q = p, p = —YU ( q ), we can take 

m ., _ h ( vu(q n+1 ) - vr(//„) \ 

) 2 U(V%+i)+V%))J 
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Fig. 3.1. Stormer-Verlet scheme applied with the symmetric adaptive step size strategy of 
Example 3.3 (Tol — 0.01); the three pictures have the same meaning as in Fig. 1.1 


as error indicator. The first component is just the difference of the Stormer-Verlet 
solution and the numerical approximation obtained by the symplectic Euler method. 
The second component is a symmetrized version of it. 

We apply this method with h n+1 / 2 determined by (3.8) and Tol = 0.01 to the 
perturbed Kepler problem (1.2) with initial values as in Fig. 1.1. The result is given 
in Fig. 3.1. We identify a correct qualitative behaviour (compared to the wrong be¬ 
haviour for the standard step size strategy in Fig. 1.1). It should be mentioned that 
the work for solving the scalar equation (3.8) for h n + 1/ 2 is not negligible, because 
the Stormer-Verlet scheme is explicit. Solving this equation iteratively, every itera¬ 
tion requires one force evaluation Vf7(g). An efficient solver for this scalar nonlin¬ 
ear equation should be used. 


A Two-Step Proportional Controller. With the aim of obtaining a completely ex¬ 
plicit integrator, Huang & Feimkuhler (1997) propose the use of two-term recur¬ 
rence relations for the step size sequence, see also Holder, Feimkuhler & Reich 
(2001). Instead of using a relation between h n + 1 / 2 , y n and y n+ 1 (cf. Example 3.1) 
which is necessarily implicit, it is suggested to use a symmetric relation between 
h n -i/ 2 i ^n+i/ 2 > and y n i which then is explicit. In particular, with the notation 
^n+1/2 — £ 5 n+1/2 > it is proposed to use the two-term recurrence relation 


1 1 _ 2 

s n+l/2 1/2 ’ 


(3.13) 


starting with s 1 / 2 = cr(yo). In combination with the Stormer-Verlet method for 
separable Hamiltonians, this algorithm is completely explicit, and the authors report 
an excellent performance for realistic problems. 

A rigorous analysis of the long-time behaviour of this variable step size Stormer- 
Verlet method is much more difficult. The results of Chapters IX and XI cannot be 
applied, because it is not a one-step mapping y n i—» y n +i- The analysis of Cirilli, 
Hairer & Feimkuhler (1999) shows that, similar to weakly stable multistep methods 
(Chap. XV), the numerical solution and the step size sequence contain oscillatory 
terms. Although these oscillations are usually very small (and hardly visible), it 
seems difficult to get rigorous estimates for them. 




314 VIII. Structure-Preserving Implementation 


VIII.3.2 Integrating, Reversible Controllers 

All variable step size approaches of this chapter are based on some time transfor¬ 
mation t <—> t given by = a(y) so that the differential equation, expressed in the 
new time variable r, becomes 

y' = -f(y), za(y) = 1. (3.14) 

z 

In Sect. VIII.2 we insert z _1 = a(y) into the differential equation and apply a nu¬ 
merical method to y' = cr(y)f(y). In Sect. VIII.3.1 we first discretize the algebraic 
relation za(y) = 1 expressing z n+1 / 2 in terms of y n and y n + i, and then apply a 
one-step method to the differential equation in (3.14) assuming 2 = z n+1 / 2 being 
constant. 

In the present section we first differentiate the algebraic relation of (3.14) with 
respect to r. This yields by Leibniz’ rule z f cr(y) + z\/a(y) T y' = 0 so that 

z' = G(y ) with G(y) =--f-Va(y) T f(y). (3.15) 

°(y) 

The idea of differentiating the constraint in (3.14) has been raised in Huang & 
Leimkuhler (1997), but soon abandoned in favour of the controller (3.13). The sub¬ 
sequent algorithm together with its theoretical justification is elaborated in Hairer 
& Soderlind (2004). The idea is to discretize first the differential equation in (3.15) 
and then to apply a one-step method to the problem (3.14) with constant z. The 
proposed algorithm is thus 

Z n+1/2 = z n-l /2 + £ G(y n ) 

= t x (3.16) 

Vn +1 — ^£/z n+1 /2 \Un) 

with Z \/2 = zo -\- e G(yo)/2 and z<j = l/a(yo). This algorithm is explicit whenever 
the underlying one-step method <&h{y) is explicit. It is called integrating controller, 
because the step size density is obtained by summing up small quantities. 

For a theoretical analysis it is convenient to introduce z n = {z n+ i/ 2 +z n _ 1 / 2 )/2 
and to write (3.16) as a one-step method for the augmented system 

V% I/( '/)• z' = G(y). (3.17) 

Z 

Notice that / (y, z) = z cr(y) is a first integral of this system. 

Algorithm 3.4. Let <Lh{y) be a one-step method for y = f(y ), y(0) = yo . With 
G(y) given by (3.15), zo = l/a(yo), and constant s, we let 

z n+ 1/2 = z n+sG(y n )/2 

]Jn+l = ®e/z n+1/2 (yn) (3.18) 

z n +1 — z n+l /2 + ^ G(y n +i)/2. 

The values y n approximate y(t n ), where t n +1 = t n + e/z n +i/ 2 . 
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This algorithm has an interesting interpretation as Strang splitting for the solu¬ 
tion of (3.17): it approximates the flow of z' = G(y) with fixed y over a half-step 
e/2 ; then applies the method <P> £ to y' = f(y)/z with fixed z\ finally, it computes a 
second half-step of z' = G(y) with fixed y. 

With the notation 



the Algorithm 3.4 has the following properties: 

• <£> e is symmetric whenever ^ is symmetric; 

• <P £ is reversible with respect to p whenever ^ is reversible with respect to p and 
G(py) = —G(y) (this is a consequence of cr(py) = cr(y)). 

These properties imply that standard techniques for constant step size implementa¬ 
tions can be applied to <£ e , and thus yield insight into the variable step size algo¬ 
rithm of this section. It will be shown in Chap. XI that when applied to integrable 
reversible systems there is no drift in the action variables and the global error grows 
only linearly with time. Moreover, the first integral I(y,z) = z a (y) of the system 
(3.17) is also well preserved (without drift) for such problems. 

Example 3.5 (Variable Step Size Stormer-Verlet method). Consider a Hamil¬ 
tonian system with separable Hamiltonian H(p,q ) = T(p) + U(q). Using the 
Stormer-Verlet method as basic method the above algorithm becomes (starting with 

z 0 = l/cr(yo) and z 1/2 = z 0 + s G(p 0 , qo)/2 ) 


z n+1/2 

Pn+1/2 

Qn-\-l 

Pn +1 


z n-l/2 + £ G(p n , q n ) 

Pn “:'Vr(l/„)/(2:„ . i /2 ) 

q n + £ VT(p n+1/2 )/ 2 ra+ i /2 
Pn+ 1/2 - eVU(q n +i)/(2z n+1/2 ). 


(3.20) 


This method is explicit, symmetric and reversible as long as Gp = — G, and 
computes approximations on a non-equidistant grid {t n } given by £ n+ i = t n + 

e / z n+ 1/2- 

Let us apply this method to the perturbed Kepler problem with data and initial 
values as in the beginning of this chapter. Further, we select a(q) = ( q T q ) a / 2 with 
cm, — 3/2, so that the control function (3.15) becomes 


G(p,q) = ~ap T q/q T q. 


(3.21) 


Figure 3.2 shows the error in the Hamiltonian along the numerical solution as well 
as the global error in the solution (Active step size 5 = 0.02). The error in the 
Hamiltonian is proportional to £ 2 without drift, and the global error grows linearly 
with time (in double logarithmic scale a linear growth corresponds to a line with 
slope one; such lines are drawn in grey). This is qualitatively the same behaviour as 
observed in constant step size implementations of symplectic methods. 
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Fig. 3.2. Numerical Hamiltonian and global error as a function of time 



Fig. 3.3. Step sizes of the variable step size Stormer-Verlet method as a function of time, and 
the control error z n a(q n ) — zoo(qo) (grey curve) 


Figure 3.3 shows the selected step sizes h n+ 1/ 2 = s/z n+ 1/ 2 as a function of 
time, and the control error z n a(q n ) — zocr(qo) in grey. Since its deviation from the 
constant value zocr(qo) = 1 is small without any drift, the step density remains 
close to l/a(q). For an explanation of this excellent long-time behaviour we refer 
to Sect. XI.3. 


VIII.4 Multiple Time Stepping 

A completely different approach to variable step sizes will be described in this sec¬ 
tion. We are interested in situations where: 

• many solution components of the differential equation vary slowly and only a few 
components have fast dynamics; or 

• computationally expensive parts of the right-hand side do not contribute much to 
the dynamics of the solution. 

In the first case it is tempting to use large step sizes for the slow components and 
small step sizes for the fast ones. Such integrators, called multirate methods , were 
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first formulated by Rice (1960) and Gear & Wells (1984). They were further devel¬ 
oped by Gunther & Rentrop (1993) in view of applications in electric circuit simula¬ 
tion, and by Engstler & Lubich (1997) with applications in astrophysics. Symmetric 
multirate methods are obtained from the approaches described below and are spe¬ 
cially constructed by Leimkuhler & Reich (2001). 

The second case suggests the use of methods that evaluate the expensive part of 
the vector field less often than the rest. This approach is called multiple time step¬ 
ping. It was originally proposed for astronomy by Hayli (1967) and has become 
very popular in molecular dynamics simulations (Streett, Tildesley & Saville 1978, 
Grubmuller, Heller, Windemuth & Schulten 1991, Tuckerman, Berne & Martyna 
1992). As noticed by Biesiadecki & Skeel (1993), one approach to such methods is 
within the framework of splitting and composition methods, which yields symmet¬ 
ric and symplectic methods. A second family of symmetric multiple time stepping 
methods results from the concept of using averaged force evaluations. 

VIII.4.1 Fast-Slow Splitting: the Impulse Method 

Consider a differential equation 

V = m, f(y ) = / [slowI (y) + f [iast] (y), (4-D 

where the vector field is split into summands contributing to slow and fast dynam¬ 
ics, respectively, and where /[ slow ] (y) is more expensive to evaluate than /[ fast ] (y). 
Multirate methods can often be cast into this framework by collecting in fi slow 1 (y) 
those components of f(y ) which produce slow dynamics and in f^ ast \y) the re¬ 
maining components. 

Algorithm 4.1. For a given N > 1 and for the differential equation (4.1) a multiple 
time stepping method is obtained from 

(C)‘"(0"“C' < 42 > 

where and ast ^ are numerical integrators consistent with y = f^ slow ^(y) 

and y = f^ ast \y), respectively. 

The method of Algorithm 4.1 is already stated in symmetrized form denotes 
the adjoint of T>h). It is often called the impulse method , because the slow part y*[ slow ] 
of the vector field is used - impulse-like - only at the beginning and at the end of 
the step, whereas the many small substeps in between are concerned solely through 
integrating the fast system y = /[ fast ] (y). 

Lemma 4.2. Let <^ low ^ be an arbitrary method of order 1, and a symmetric 

method of order 2. Then, the multiple time stepping algorithm (4.2) is symmetric 
and of order 2. 

If f[ slow \(y) and f^ ast ^(y) are Hamiltonian and if <^ low ^ and are both 

symplectic, then the multiple time stepping method is also symplectic. 
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Proof. Due to the interpretation of multiple time stepping as composition methods 
the proof of these statements is obvious. □ 


The order statement of Lemma 4.2 is valid for h —> 0, but should be taken with 
caution if the product of the step size h with a Lipschitz constant of the problem 
is not small (see Chap. XIII for a detailed analysis): it is not stated, and is not true 
in general for large N, that if h and h/N are the step sizes needed to integrate the 
slow and fast system, respectively, with an error bounded by e, then the error of the 
combined scheme is O(e). 

The most important application of multiple time stepping is in Hamiltonian sys¬ 
tems with a separable Hamiltonian 

H(p,q) = T(p) + U(q), U(q) = U^\q) + U^(q). (4.3) 

If we let the fast vector field correspond to T(p) + £/[ fast ](g) and the slow vector 
field to £/[ slow ] (g), and if we apply the Stormer-Verlet method and exact integration, 
respectively, Algorithm 4.1 reads 


[slow] 

Vh/2 


( [fast] 

vPh/2N 


[fasti \ 

’ Ph/2N) 


where cpf , (^ slow ^, are the exact flows corresponding to the Hamiltonian sys¬ 
tems for T(p), £/[ slow ] (g), £/[ fast ] (g), respectively. Notice that for TV = 1 the method 
(4.4) reduces to the Stormer-Verlet scheme applied to the Hamiltonian system with 
H(p, g). This is a consequence of the fact that (pf ast ^ o is the exact 

flow of the Hamiltonian system corresponding to U(q) of (4.3). In the molecular 
dynamics literature, the method (4.4) is known as the Verlet-I method (Grubmuller 
et al. 1991, who consider the method with little enthusiasm) or r-RESPA method 
(Tuckerman et al. 1992, with much more enthusiasm). 


Example 4.3. In order to illustrate the effect of multiple time stepping we choose a 
‘solar system’ with two planets, i.e., with a Hamiltonian 

R( lfPoPo Pi Pi P 2 Pi\ m 0 nii m 0 m 2 mim 2 

[P ' q) ~ 2 \ m 0 + mi + m 2 ) \\q 0 - qi\\ ||<fo-«2|| ||«i-®||’ 

where mo = l,mi = m 2 = 10 -2 and initial values qo = (0,0), qo = (0,0), 
qi = (1,0), qi = (0,1), q 2 = (4, 0), q 2 = (0,0.5). With these data, the motion of 
the two planets is nearly circular with periods close to 2 i r and 147T, respectively. 

We split the potential as 

U^\q) = U^\q) = ~ 

\m-qi\\ Iko — <72 11 \\qi-q 2 \\ 

and we apply the algorithm of (4.4) with N = 1 (Stormer-Verlet), N = 4, and 
N = 8 . Since the evaluation of pf low ^ is about twice as expensive as and 

that of (pf is of negligible cost, the computational work of applying (4.4) on a fixed 
interval is proportional to 
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Fig. 4.1. Maximal error in the Hamiltonian as a function of computational work 


2tt (2 + N) 
T 3 


(4.5) 


Our computations have shown that this measure of work corresponds very well to 
the actual cpu time. 

We have solved this problem with many different step sizes h. Figure 4.1 shows 
the maximal error in the Hamiltonian (over the interval [0, 2007r]) as a function of 
the computational work (4.5). We notice that the value N = 4 yields excellent 
results for relatively large as well as small step sizes. It noticeably improves the 
performance of the Stormer-Verlet method. If N becomes too large, an irregular 
behaviour for large step sizes is observed. Such “artificial resonances” are notorious 
for this method and have been discussed by Biesiadecki & Skeel (1993) for a similar 
experiment; also see Chap. XIII. For large N we also note a loss of accuracy for 
small step sizes. The optimal choice of N (which here is close to 4) depends on the 
problem and on the splitting into fast and slow parts, and has to be determined by 
experiment. 


The multiple time stepping technique can be iteratively extended to problems 
with more than two different time scales. The idea is to split the ‘fast’ vector field 
of (4.1) into f^ ast ^(y) = + f^ s \y), and to replace the method in 

Algorithm 4.1 with a multiple time stepping method. Depending on the problem, a 
significant gain in computer time may be achieved in this way. 

Many more multiple time stepping methods that extend the above Verlet-I/r- 
RESPA/impulse method, have been proposed in the literature, most notably the 
mollified impulse method of Garcia-Archilla, Sanz-Serna & Skeel (1999); see 
Sect. XIII. 1. 


VIII.4.2 Averaged Forces 

A different approach to multiple time stepping arises from the idea of advancing the 
step with averaged force evaluations. We describe such a method for the second- 
order equation 
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v = f(y ), f(v) = f [slow] (y) + / [fast] (y). (4.6) 

The exact solution satisfies 

y(t + h)~ 2 y(t) + y{t - h) = h 2 J (1 - |0|) f(y(t + Oh)) dO , 

where the integral on the right-hand side represents a weighted average of the force 
along the solution, which is now going to be approximated. At t = t n , we replace 

f{y{tn + Oh)) » /[ slow l(y n ) + /[f-t] ( u( ^)) 

where u(r) is a solution of the differential equation 

U = f [slow] (y n ) + / [fast] (w) • (4.7) 


We then have 

pi 

h 2 j (1 - |0|) (/ [slowI (2/n) + / [fast] {u{0h)))d0 = u(h) - 2«(0) + u(-h) . 

The velocities are treated similarly, starting from the identity 

y(t + h) — y(t — h) = h J f(y(t + 0h))d0. 

A Symmetric Two-Step Method. For the differential equation (4.7) we assume the 
initial values 

"(0) = y n , u( 0) = y n . (4.8) 

This initial value problem is solved numerically, e.g., by the Stormer-Verlet method 
with a smaller step size ±h/N on the interval [—h,h\, yielding numerical approxi¬ 
mations UN{±h) and vn(±H) to u(±h) and u(±h), respectively. Note that no fur¬ 
ther evaluations of /[ slow ] are needed for the computation of un(±H) and r(±h). 
This finally gives the symmetric two-step method (Hochbruck & Lubich 1999a) 

Un+l ~ tyri + Vn-l = U N (h) - 2u N (ti) + U N (~h) 

Vn+l-Vn-l = V N (h) -V N (-h) . 

The starting values y\ and iji are chosen as and which correspond to 

(4.7) and (4.8) for n = 0. 

A Symmetric One-step Method. An explicit one-step method with similar aver¬ 
aged forces is obtained when the initial values for (4.7) are chosen as 

u(0)=y n , *(0) = 0 . (4.10) 

It may appear crude to take zero initial values for the velocity, but we remark that 
for linear /[ fast ] the averaged force (u(h) — 2^(0) + u(—h))/h 2 does not depend on 
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the choice of 'it(O). Moreover the solution then satisfies u(—t) = u(t) 9 so that the 
computational cost is halved. We again denote by ujsr(h) = ujsr(-h) the numerical 
approximation to u(h) obtained with step size ±h/N from a one-step method (e.g., 
from the Stormer-Verlet scheme). Because of (4.10) the averaged forces 

F n — i u N(h) — 2^At(0) + UN( — h)) = ( u N (h ) — Un(0)) 

now depend only on y n and not on the velocity y n . In trustworthy Verlet manner, 
the scheme y n +i — 2 y n + y n -1 = h 2 F n can be written as the one-step method 

v n+ 1/2 = v n H - 2 F n 

Vn+1 = y n + hv n + 1/2 (4.11) 

, ^ 77 

v n+1 — V n +l/2 + 2 ^ n + 1 • 

The auxiliary variables v n can be interpreted as averaged velocities: we have 

= Vn+l -Jn -1 ^ yitn+ 1 ) ~y(t n . l) = 1 f' + Q h) d Q . 

2 h 2 h 2 v y 


This average may differ substantially from ^/(t n ) if the solution is highly oscillatory 
in [—/i, ft,]. In the experiments of this section it turned out that the choice vq = yo 
and y n = v n as velocity approximations gives excellent results. 

In a multirate context, symmetric one-step schemes using averaged forces 
were studied by Hochbruck & Lubich (1999b), Nettesheim & Reich (1999), and 
Leimkuhler & Reich (2001). A closely related approach for problems with multiple 
time scales is the heterogeneous multiscale method by E (2003) and Engquist & 
Tsai (2005). 

Example 4.4. We add a satellite of mass m 3 = 10 -4 to the three body-problem of 
Example 4.3. It moves rapidly around the planet number one. The initial positions 
and velocities are q% = (1.01, 0) and p% = (0, 0). We split the potential as 


ul ^ ]{q) = __^s i 

\m -©II 


f/[sl° w ](g) 


E 

i<j 

V3) 


miirij 


and we apply the methods (4.9), (4.11), and the impulse method (4.4). Since the 
sum in f/t slow ] contains 5 terms, the computational work is proportional to 


5 + N 
6 ft 

6 + 27V 
6ft 


for methods (4.11) and (4.4) 
for method (4.9). 


For each of the methods we have optimized the number N of small steps. We ob¬ 
tained a flat minimum near N = 40 for (4.9) and (4.4), and a more pronounced 
minimum at N = 12 for (4.11). Figure 4.2 shows the errors at t = 10 in the posi¬ 
tions and in the Hamiltonian as a function of the computational work. 
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Fig. 4.2. Errors in position and in the Hamiltonian as a function of the computational work; 
the classical Stormer-Verlet method, the impulse method (4.4), and the averaged force meth¬ 
ods (4.11) and (4.9). The errors in the Hamiltonian are indicated by grey lines (same linestyle) 


The error in the position is largest for the Stormer-Verlet method and signif¬ 
icantly smallest for the one-step averaged-force method (4.11). The errors in the 
velocities are about a factor 100 larger for all methods. They are not included in 
the figure. The error in the Hamiltonian is very similar for all methods with the 
exception of the two-step averaged-force method (4.9), for which it is much larger. 


VIII.5 Reducing Rounding Errors 

... the idea is to capture the rounding errors and feed them back into the 
summation. (N.J. Higham 1993) 

All numerical methods for solving ordinary differential equations require the com¬ 
putation of a recursion of the form 


Un+l — Un ( 5 . 1 ) 

where S n , the increment, is usually smaller in magnitude than the approximation y n 
to the solution. In this situation the rounding errors caused by the computation of S n 
are in general smaller than those due to the addition in (5.1). 

A first attempt at reducing the accumulation of rounding errors (in fixed-point 
arithmetic for his Runge-Kutta code) was due to Gill (1951). Kahan (1965) and 
Moller (1965) both extended this idea to floating point arithmetic. The resulting al¬ 
gorithm is nowadays called ‘compensated summation’, and a particularly nice pre¬ 
sentation and analysis is given by N. Higham (1993). In the following algorithm we 
assume that y n is a scalar; vector valued recursions are treated componentwise. 
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Algorithm 5.1 (Compensated Summation). Let yo and {£ n }n>o be given and 
put e = 0. Compute yi,y 2 , - • •from (5.1) as follows: 
for n = 0, 1 , 2 , ... do 

a = Vn 

e = e + 5 n 
Vn+i = a + e 
e = e + (a — 2/ n +i) 
do 

This algorithm can best be understood with the help of Fig. 5.1 (following the 
presentation of N. Higham (1993)). We present the mantissas of floating point num¬ 
bers by boxes, for which the horizontal position indicates the exponent (for a large 
exponent the box is more to the left). The mantissas of y n and e together represent 
the accurate value of y n (notice that in the beginning e = 0). The operations of 
Algorithm 5.1 yield y n +i and a new e, which together represent y n +x = y n + S n . 
No digit of 5 n is lost in this way. With a standard summation the last digits of 5 n 
(those indicated by 5" in Fig. 5.1) would have been missed. 


a — y n 

a! 

a " 


e 



e! 1 

0 

S n 


■ ■ 

5" 


e — 6 T" &n 


■ M 

e' + S’\ 


yn+1 — CL + e 

a' 

a" + S\ 



e+ (a - 2/n+i) 



1 e' + 5" 

1 ra I 


Fig. 5.1. Illustration of the technique of “compensated summation” 


Numerical Experiment. We study the effect of compensated summation on the 
Kepler problem (1.2.2) (written as a first order system) with eccentricity e = 0.6 
and initial values as in (1.2.11), so that the period of the elliptic orbit is exactly 
27r. As the numerical integrator we take the composition method (V.3.13) of order 
8 with the Stormer-Verlet scheme as basic integrator. We compute the numerical 
solution with step size h = 27t/ 500 once with standard update of the increment, 
once with compensated summation (both in double precision) and, in order to get a 
reference solution, we also perform the whole computation in quadruple precision. 
The difference between the double and quadruple precision computations gives us 
the rounding errors. Their Euclidean norms as a function of time are displayed in 
Fig. 5.2. 

We see that throughout the whole integration interval the rounding errors of the 
standard implementation are nearly a factor of 100 larger than those of the imple¬ 
mentation with compensated summation. This corresponds to the inverse of the step 
size or, more precisely, to the mean quotient between y n and 5 n in (5.1). In Fig. 5.2 
we have also included the pure global error of the method (without rounding errors) 
at integral multiples of the period 2tt (hence no oscillations are visible). This is 
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Fig. 5.2. Rounding errors and pure global error as a function of time; the parallel grey lines 
indicate a growth of 0{t 3//2 ) 


obtained as the difference of the numerical solution computed with quadruple pre¬ 
cision and the exact solution. We observe a linear growth of the pure global error 
(this will be explained in Sect. X.3) and a growth like 0(t 3 / 2 ) due to the rounding 
errors. Thus, eventually the rounding errors will surpass the truncation errors, but 
this happens for the compensated summation only after some 1000 periods. 

Probabilistic Explanation of the Error Growth. Our aim is to explain the growth 
rate of rounding errors observed in Fig. 5.2. Denote by Sk the vector of rounding 
errors produced during the computations in the kth step. Since the derivative of the 
flow describes the propagation of these errors, the accumulated rounding error 
at time t = tjv (tk = kh) is 


N 

Vt = ^2<Pt-t k (Vk)ek- (5.2) 

k=1 

For the Kepler problem and, in fact, for all completely integrable differential equa¬ 
tions (cf. Sect. X.l) the flow and its derivative grow at most linearly with time, i.e., 

\Wt-t k (y)\\ - a + b{t - t k ) for t>t k . (5.3) 

Using £k = 0(eps ), where eps denotes the roundoff unit of the computer, an appli¬ 
cation of the triangle inequality to (5.2) yields rj t = G{t 2 eps). From our experiment 
of Fig. 5.2 we see that such an estimate is too pessimistic. 

For a better understanding of accumulated rounding errors over long time inter¬ 
vals we make use of probability theory. Such an approach has been developed in 
the classical book of Henrici (1962). We assume that the components Ek% of Sk are 
random variables with mean and variance 


E(ski) = 0, Var(e ki ) = C ki ■ eps 2 , 

and uniformly bounded Cki < C. For simplicity we assume that all Ski are indepen¬ 
dent random variables. Replacing the matrix p t _ tk (yk) in (5.2) with <ft-t k {y(tk)) 
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and denoting its entries by , the ith component of the accumulated rounding 
error (5.2) becomes 

N n 

Vti = EE ^ijk^kj •> 

k= 1 j=l 

a linear combination of the random variables e^j. Elementary probability theory 
thus implies that 


N n 

E(rj ti ) = 0 and Vi/r (//,,) = EE w ijk Var ( £ kj)- 

k=1j=1 


Inserting the estimate (5.3) for Wijk we get 

N 2 f C 

Var(rj ti ) < 52(a + b(t - t k )) 2 max Var(skj) = 0\^—t 3 eps 2 j. 
k= 1 ^ 

Consequently, the Euclidean norm of the expected rounding error rj t is 

/ n \ 1/2 

(E^(^)) = °{]J h t3/2 eps )' 

E=i * 

This is in excellent agreement with the results displayed in Fig. 5.2. 


VIII.6 Implementation of Implicit Methods 

Symplectic methods for general Hamiltonian equations are implicit, and so are sym¬ 
metric methods for general reversible systems. Also, when we consider variable step 
size extensions as described in Sections VIII.3 and VIII.2, we are led to nonlinear 
equations. The efficient numerical solution of such nonlinear equations is the main 
difficulty in an implementation of implicit methods. Notice that in the context of 
geometric integration there is no need of ad-hoc strategies for step size and order 
selection, so that the remaining parts of a computer code are more or less straight¬ 
forward. 

In the following we discuss the numerical solution of the nonlinear system de¬ 
fined by an implicit Runge-Kutta method. We have the Gauss methods of Sect. II. 1.3 
in mind which are symplectic and symmetric. An extension of the ideas to parti¬ 
tioned Runge-Kutta methods and to Nystrom methods is obvious. For simplicity of 
notation we consider autonomous differential equations y = f(y ), and we write 
the nonlinear system of Definition II. 1.1 in the form 

s 

Ein ^ 5 , f {iJn “t” ^jn ) = O5 

j=l 




( 6 . 1 ) 
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The unknown variables are Zi n ,..., Z sn , and the equivalence of the two formula¬ 
tions is via the relation ki = f(y n + Z in ). The numerical solution after one step 
can be expressed as 


s 

Vn +1 =Vn + h^2 b if{Vn + Z in ) . (6.2) 

i= 1 

For implicit Runge-Kutta methods the equations (6.1) represent a nonlinear system 
that has to be solved iteratively. We discuss the choice of good starting approxima¬ 
tions for Zi n as well as different nonlinear equation solvers (fixed-point iteration, 
modified Newton methods). 

VIII.6.1 Starting Approximations 

The most simple approximations to the solution Zi n of (6.1) are Zf n = 0 or 
Z® n = hcif(y n ) where q = X^=i a V- They are, however, not very accurate 
and we will try to exploit the information of previous steps for improving them. 
There are essentially two possibilities: either use only the information of the last 
step y n - 1 i—► y n (methods (A) and (B) below), or consider a fixed i and use the 
interpolation polynomial that passes through Z^ n -i for l = 1,2,... (method (C)). 
Let us separately discuss these two approaches. 

(A) Use of Continuous Output. Consider the polynomial w n -i(t) of degree 5 that 
interpolates the values (£ n _i, y n -i) and (t n _i+Q/i, l^ n _i) for i m 1 ,..., s, where 
Yi ^ n -1 = y n -\ +^ 2 , n -i is the argument in (6.1) of the previous step. For collocation 
methods (such as Gauss methods) w n -i(t) is the collocation polynomial, and we 
know from Lemma II. 1.6 that on compact intervals 

W n -1 (t) - y(t ) = 0(h q+1 ) (6.3) 

with q = s, where y(t) denotes the solution of y = f(y ) satisfying y{t n -i) = y n - 1 - 
For Runge-Kutta methods that are not collocation methods, (6.3) holds with q de¬ 
fined by the condition C(q) of (II. 1.11). Since the solution of y = f(y ) passing 
through y(t n ) = y n is 0{h p+1 ) close to y(t) with p > q, we have w n (t) = 
w n —i (t) + 0(h q+1 ) and the computable value 

Z in= Y in~yn, V/,', W n I T//;) (6.4) 

serves as starting approximation for (6.1) with an error of size 0(h q+1 ). This ap¬ 
proach is standard in variable step size implementations of implicit Runge-Kutta 
methods (cf. Sect. IV .8 of Hairer & Wanner (1996)). Since w n -i(t) — y n -i is a lin¬ 
ear combination of the ^, n -i = Yi,n- 4 — 2 M-i, it follows from ( 6 . 1 ) that it is also 
a linear combination of hf(Yi^ n -i), so that 

Vn^n-i + ^E/WC^n-i)- 

3 = 1 


(6.5) 
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For a constant step size implementation, the faj depend only on the method coef¬ 
ficients and can be computed in advance as the solution of the linear Vandermonde 
type system 

E ^ c )~ X = (1+ fc Cj)fc » k = l,...,8 (6.6) 

3 = 1 

(see Exercise 2). For collocation methods and for methods with q > s — 1 the 
coefficients fiij from ( 6 . 6 ) are optimal in the sense that they are the only ones making 
(6.5) an sth order approximation to the solution of (6.1). For q < s — 1, more 
complicated order conditions have to be considered (Sand 1992). 

(B) Starting Algorithms Using Additional Function Evaluations. In particular 
for high order methods where s is relatively large, a much more accurate starting 
approximation can be constructed with the aid of a few additional function eval¬ 
uations. Such starting algorithms have been investigated by Laburta (1997), who 
presents coefficients for the Gauss methods up to order 8 in Laburta (1998). 

The idea is to use starting approximations of the form 

s rn 

Y in = Vn-1 f (Y s+j , n _ i), (6.7) 

3 =1 3=1 

where Yi ?n _i,..., V s?n _i are the internal stages of the basic implicit Runge-Kutta 
method (with coefficients c* , , bj ), and the additional internal stages are computed 

from 

s+i—1 

Xs+i,n—1 = Vn—1 T~ h 'y ^ 

3 = 1 

For a fixed i, we interpret Y® n as the result of the explicit Runge-Kutta method with 
coefficients of the right tableau of 


exact ith stage approximate 


c 

A 


C 

A 


1 + c 

B 

A 

JJ_ 

Mi 

m 2 


b T 

T 

a i 


31 

T 

"1 


Here, (Mi, M 2 ) = M = (fjijk), — S£i _1 l^jk, and c, /i, /%, Vi are the vectors 
composed of Cj , /ij , [3ij , z/^ , respectively. The exact stage values Y in are interpreted 
as the result of the Runge-Kutta method with coefficients given in the left tableau 
of (6.8). The entries of the vectors 1, b and a* are 1, bj and a^ , respectively, and B 
is the matrix whose rows are all equal to b T . 

If the order conditions (see Sect. III. 1) for the two Runge-Kutta methods of (6.8) 
give the same result for all trees with < r vertices, we get an approximation of order 
r, i.e., Y® n — Y in = 0(h r+1 ). For the bushy tree T& = with k vertices 

we have 
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1 + Y, 1 — 1 + y ^ a ijQ- + C j) k 1 ' ( 6 -9) 

3 = 1 J=1 J=1 J=1 

Notice that for collocation methods (such as the Gauss methods) the condition C(s) 
reduces the right-hand expression of this equation to (1 + Ci) k /k for k < s. For 
m = 0 , these conditions are thus equivalent to ( 6 . 6 ). 

For the tree [r^] = • ]] with k + 1 vertices we get the condition 


s ms rri 

X + X v a (X ^ c f _1 + X Hs+irf -1 ) 

j,i =1 3 =1 *='! i=l 

s s 

= X b i a ji c Y + XI + M 1 + + _1 )- 

J,/=l 1 


( 6 . 10 ) 


We now assume that the Runge-Kutta method corresponding to the right tableau of 
( 6 . 8 ) satisfies condition C(s). This means that the method (c, A, b ) is a collocation 
method, and that the coefficients jiij have to be computed from the linear system 

s+i — 1 

^2 k = l,...,s. ( 6 . 11 ) 

3 = 1 


The method corresponding to the left tableau of ( 6 . 8 ) then also satisfies C(s). Con¬ 
sequently, the order conditions are simplified considerably, and it follows from 
Sect. III. 1 that Y® n is an approximation to the exact stage value Y in of order 8 + 1 or 
8 + 2 if the following conditions hold: 


order 8 + 1 
order 8 + 2 


if (6.9) for k = 1,.., s + 1; 

if (6.9) for k = 1,..., 8 + 2 , and (6.10) for k = s + 1. 


( 6 . 12 ) 


For an approximation of order 8 + 1 we put m = 1, we arbitrarily choose 
lii , we compute /iij from (6.11), and the coefficients and vn from (6.9) with 
k = 1,..., 8 + 1. A reasonable choice for the free parameter is fii G [1, 2] (in our 
computations we take /ii = 1.75 for 8 = 2,4, and fii = 1.8 for 8 = 6. 1 

For an approximation of order 8 + 2 we put m = 3. One of the three additional 
function evaluations can be saved if we put (i\ = 0 and fi 2 = 1. This implies 
1^+1,n-1 = Vn—i and F s+2 ,n-i = Vn, so that the evaluation of /(X s+ i, n -i) is 
already available from computations for the preceding step (FSAL technique, “first 
same as last”). In our experiments we take fi 3 = 1.6 for s = 2, 113 = 1.65 for 
8 = 4, and /i 3 = 1.75 for 8 = 6 . The coefficients /%, Oij are then obtained as 
the solution of Vandermonde like linear systems. 

For an implementation it is more convenient to work with the quantities Z® n = 
Yi° n — y n and to write (6.7) in the form 

1 Laburta (1997) proposes to consider m = 2, /n = 0, /12 = 1 (apart from the first step 
this also needs only one additional function evaluation per step), and to optimize free 
parameters by satisfying the order conditions for some trees with one order higher. 
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^in — h a ij f (Yj,n-l) + h ^ ] ISjj f (Y s +j, n - j) (6.13) 

J=1 J=1 

with Ofy- = - bj. 

(C) Equistage Approximation. From the implicit function theorem, applied to the 
nonlinear system (6.1), we know that Z in = z(y n , h), where the function z(y, h ) 
is as smooth as f(y). Furthermore, since on compact intervals the global error of a 
one-step method permits an asymptotic expansion in powers of h, we have y n -i = 
y N (t n -i , ti) + 0(h N ^ 1 ) with y N (t, h) = y(t) + h p e p (t) +... + /i N ejv(t) (the value 
of N can be chosen arbitrarily large if f(y) is sufficiently smooth). Consequently, 
Zi in -i is 0{h N+1 ) close to the smooth function z{yjsr(t,h),ti) at £ = t n — Ih. 
Let Q(t) be the polynomial of degree k — 1 defined by Q(t n -i) = for 

/ = 1,..., k. Then, the value 

Z? n = Ci(tn) (6.14) 

yields a 0(/i fe+1 ) approximation to the solution of (6.1). This interpolation pro¬ 
cedure was first proposed by In’t Hout (1992) for the numerical solution of delay 
differential equations. For the iterative solution of the nonlinear Runge-Kutta equa¬ 
tions (6.1), the starting approximation (6.14) is proposed and analyzed by Calvo 
(2002). 

The implementation of this approach is very simple. Using Newton’s interpola¬ 
tion formula we have 

Zi n = Zi, n -i + V^ >n _i + . . . + 1 (6.15) 

with backward differences given by = Z^ n — Z^ n -i, V 2 Z^ n = \7Zi^ n — 

V Z% 5 n— i ? etc. 

Numerical Study of Starting Approximations. We consider the Kepler problem 
with eccentricity e = 0.6 and initial values such that the period is 2i r. With many 
different step sizes h = 2ir/N we compute TV + 1 steps with the Gauss method 
of order p = 28 (p = 4, 8,12). In the last step we compute the different starting 
approximations and their error (X^=i || Zi n — Zf 7 JJ 2 ) 1 / 2 as a function of the step 
size h. The result is plotted in Fig. 6.1. There, the pictures also contain the global 
errors after one period. They allow us to localize the values of h, which are of 
practical interest. 

We observe that the equistage approximation (6.15) also behaves like 0{h k+1 ) 
when k + 1 is larger than the order of the integrator. However, due to the increas¬ 
ing error constants, the accuracy is improved only for small step sizes. An opti¬ 
mal k could be estimated by checking the decrease of the backward differences 
|| V J Z^ n _-i||. The error of the starting approximation obtained from the continuous 
output behaves like 0(h s+1 ) (for the Gauss methods) and, in contrast to the equi¬ 
stage approximation, improves with increasing order. The approximations (6.7) of 
order s + 1 and 8 + 2 are a clear improvement. As a conclusion we find that for this 
example the equistage approximation (which is free from additional function eval¬ 
uations) is preferable only for s = 2 (order 4). For higher order, the approximation 
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Fig. 6.1. Errors of starting approximations for Gauss methods as functions of the step size 
h: thick dashed lines for the extrapolated continuous output (6.4) and for the approximations 
(6.7) of order s + 1 and s + 2; thin solid lines for the equistage approximation (6.15) with 
k = 0,1,..., 7; the thick solid line represents the global error of the method after one period 


obtained from (6.7) is significantly more accurate and so it is worthwhile to spend 
these two additional function evaluations per step. 

VIII.6.2 Fixed-Point Versus Newton Iteration 

Finally we investigate the iterative solution of the nonlinear Runge-Kutta system 
(6.1). We discuss fixed-point and Newton-like iterations, and we compare their effi¬ 
ciency to the use of composition methods. 

Fixed-Point Iteration. This is the most simple and most natural iteration for the 
solution of (6.1). With any starting approximation Zf n from Sect. VIII.6.1 it reads 

s 

^n +1 =hY j a ij f{y n + Z'y n ) : im .x. (6.16) 

3 = 1 

In the case where the entries of the Jacobian matrix f'(y ) are not excessively large 
(nonstiff problems) and that the step size is sufficiently small, this iteration con¬ 
verges for k —> oo to the solution of (6.1). Usually, the iteration is stopped if a 
certain norm of the differences Z^ 1 — Z\ n is sufficiently small. We then use Z\ n 
in the update formula (6.2) so that no additional function evaluation is required. 

For a numerical study of the convergence of this iteration, we consider the Ke¬ 
pler problem with eccentricity e = 0.6 and initial values as in the preceding experi¬ 
ments (period of the solution is 2i r). We apply the Gauss methods of order 4, 8, and 
12 with various step sizes. For the integration over one period we show in Table 6.1 
the total number of function evaluations, the mean number of required iterations per 
step, and the global error at the endpoint of integration. As a stopping criterion for 
the fixed-point iteration we check whether the norm of the difference of two succes¬ 
sive approximations is smaller than 10“ 16 (roundoff unit in double precision). As 
a starting approximation Zf n we use (6.15) with k = 8 for the method of order 4, 
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Table 6.1. Statistics of Gauss methods (total number of function evaluations, number of 
fixed-point iterations per step, and the global error at the endpoint) for computations of the 
Kepler problem over one period with e = 0.6 


Fixed-point iteration (general problems) 


Gauss 

h = 2tt/25 

h = 2tt/50 

h = 2tt/100 

h = 2tt/200 

h = 2tt/400 

order 4 

803 

16.1 

9.2 • 10 -2 

1043 

10.4 

1.7- 10 -2 

1393 

7.0 

1.3 • 10 -3 

1825 

4.6 

8.4- 10 -5 

2319 

2.9 

5.3 • 10“ 6 

order 8 

1021 

9.7 

1.1 • 10 -3 

1455 

6.8 

6.9- 10“ 7 

2 091 

4.7 

3.6 • 10 -9 

3 007 

3.3 

1.8 • 10“ n 

4183 

2.1 

6.9 • 10“ 14 

order 12 

1297 

8.3 

2.7 • 10 -6 

1731 

5.4 

8.0 • 10 -11 

2311 

3.5 

2.7 • 10 -14 

3 441 

2.5 

< roundoff 

5917 

2.1 

< roundoff 


and the approximation (6.7) of order 5 + 2 for the methods of orders 8 and 12. The 
coefficients are those presented after equation (6.12). 

Since the starting approximations are more accurate for small h , the number 
of necessary iterations decreases drastically. In particular, for the 4th order method 
we need about 16 iterations per step for h = 27t/ 25, but at most 2 iterations when 
h < 27t/ 800. If one is interested in high accuracy computations (e.g., long-time 
simulations in astronomy), for which the error over one period is not larger than 
IQ- 10 , Table 6.1 illustrates that high order methods (p > 12) are most efficient. 

Newton-Type Iterations. A standard technique for solving nonlinear equations is 
Newton’s method or some modification of it. Writing the nonlinear system (6.1) of 
an implicit Runge-Kutta method as F(Z) = 0 with Z = (Zi n ,..., Z sn ) T , the 
Newton iteration is 

z k +i = z k - M~ 1 F{Z k ), (6.17) 

where M is some approximation to the Jacobian matrix F'(Z k ). Since the solution 
Z of the nonlinear system is 0(h) close to zero, it is common to use M = F'( 0) 
so that the matrix M is independent of the iteration index k. In our special situation 
we get 

M = I ® I — hA® J (6.18) 

with J = f'(y n )• Here, I denotes the identity matrix of suitable dimension, and A 
is the Runge-Kutta matrix. 

We repeat the experiment of Table 6.1 with modified Newton iterations instead 
of fixed-point iterations. The result is shown in Table 6.2. We have suppressed the 
error at the end of the period, because it is the same as in Table 6.1. As expected, the 
convergence is faster (i.e., the number of iterations per step is smaller) so that the 
total number of function evaluations is reduced. However, we do not see in this table 
that we computed at every step the Jacobian f'(y n ) and an L 77-decomposition of 
the matrix M. Even if we exploit the tensor product structure in (6.18) as explained 
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Table 6.2. Statistics of Gauss methods (total number of function evaluations, number of 
iterations per step) for computations of the Kepler problem over one period with e = 0.6 


Modified Newton iteration (general problems) 


Gauss 

h — 27t/25 

h — 27t/50 

h = 2tt/100 

h = 2tt/200 

h = 2tt/400 

order 4 

383 

511 

765 

1125 

1677 

7.7 

5.1 

3.8 

2.8 

2.1 

order 8 

597 

883 

1387 

2 307 

3 667 

5.5 

3.9 

3.0 

2.4 

1.8 

order 12 

763 

1095 

1717 

3 003 

5 689 

4.7 

3.3 

2.5 

2.2 

2.0 


in Hairer & Wanner (1996, Sect. IV.8), the cpu time is now considerably larger. 
Further improvements are possible, if the Jacobian of / and hence also the LR- 
decomposition of M is frozen over a couple of steps. But all these efforts can hardly 
beat (in cpu time) the straightforward fixed-point iterations. In accordance with the 
experience of Sanz-Serna & Calvo (1994, Sect. 5.5) we recommend in general the 
use of fixed-point iterations. 

Separable Systems and Second Order Differential Equations. Many interesting 
differential equations are of the form 

v = f(y), y = g(v)- (6.19) 

For example, the second order differential equation y m f(y) is obtained by putting 
g(rj) = r]. Also Hamiltonian systems with separable Hamiltonian H (p, q ) =T(p) + 
U (q) are of the form (6.19). 

For this particular system the Runge-Kutta equations (6.1) become 

s s 

Cin h ^ ^ ^ij f (jjn H" ^jn ) 65 ^in h ^ ^ 9 (jin H" Cjn) 0 * 

3 = 1 3 — 1 

In this case we can still do better: instead of the standard fixed-point iteration (6.16) 
we apply a Gauss-Seidel like iteration 

s s 

Cin 1 = ^ E + Z Jn)> ^ = h £ OtfSfan + C^), (6-20) 

3 = 1 3 =1 

which is explicit for separable systems (6.19). Notice that the starting approxima¬ 
tions have to be computed only for Q n . Those for Zi n are then obtained by (6.20) 
with fc + 1 = 0. 

For second order differential equations y = f(y), where g(rj) = ip this iteration 
becomes 

Zin 1 = hc iVn + ft 2 ^2 /(Vn + Z$ n ), 

3 = 1 


( 6 . 21 ) 
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Table 6.3. Statistics of iterations (6.20) for Gauss methods (total number of function evalua¬ 
tions, number of iterations per step) for computations of the Kepler problem over one period 
with e = 0.6 


Fixed-point iteration (separable problems) 


Gauss 

h — 27t/25 

h — 27r/50 

h = 2tt/100 

h = 2tt/200 

h = 2tt/400 

order 4 

437 

603 

857 

1201 

1717 

8.7 

6.0 

4.3 

3.0 

2.1 

order 8 

613 

923 

1427 

2 339 

3 647 

5.6 

4.1 

3.1 

2.4 

1.8 

order 12 

781 

1131 

1741 

3 027 

5 677 

4.9 

3.4 

2.6 

2.2 

2.0 


where q = Ylj =i a ij an d are the entries of the square A 2 of the Runge-Kutta 
matrix (any Nystrom method could be applied as well). Due to the factor h 2 in (6.21) 
we expect this iteration to converge about twice as fast as the standard fixed-point 
iteration. 

The Kepler problem is a second order differential equation, so that the iteration 
(6.21) can be applied. In analogy to the previous tables we present in Table 6.3 the 
statistics of such an implementation of the Gauss methods. We observe that for rel¬ 
atively large step sizes the number of iterations required per step in nearly halved 
(compared to Table 6.1). For high accuracy requirements the number of necessary 
iterations is surprisingly small, and the question arises whether such an implemen¬ 
tation can compete with high order explicit composition methods. 

Comparison Between Implicit Runge-Kutta and Composition Methods. We 

consider second order differential equations y = f(y), so that composition methods 
based on the explicit Stormer-Verlet scheme can be applied. We use the coeffi¬ 
cients of method (V.3.14) which has turned out to be excellent in the experiments of 
Sect. V.3.2. It is a method of order 8 and uses 17 function evaluations per integration 
step. 

We compare it with the Gauss methods of order 8 and 12 (i.e., 8 = 4 and s = 6). 
As a starting approximation for the solution of the nonlinear system (6.1) we use 
(6.7) with m = 3, fi\ = 0, 112 = 1, ^3 = 1-75, fiij chosen such that (6.11) holds 
for k = l,...,s + i — 1, and fyj , Vij such that order s + 2 is obtained. Since we are 
concerned with second order differential equations, we apply the iterations (6.20) 
until the norm of the difference of two successive approximations is below 10 -17 . 

For both classes of methods we use compensated summation (Algorithm 5.1), 
which permits us to reduce rounding errors. For composition methods we apply this 
technique for all updates of the basic integrator. For Runge-Kutta methods, we use 
it for adding the increment to y n and also for computing the sum J2t=i biki- 

The work-precision diagrams of the comparison are given in Fig. 6.2. The upper 
pictures correspond to the Kepler problem with e = 0.6 and an integration over 100 
periods; the lower pictures correspond to the outer solar system with data given in 
Sect. 1.2.4 and an integration over 500 000 earth days. The left pictures show the 
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10- 3 

io- 6 

10- 9 

10- 12 

10“ 3 

IO " 6 

IO" 9 

Fig. 6.2. Work-precision diagrams for two problems (Kepler and outer solar system) and 
three numerical integrators (composition method with coefficients of method (V.3.14) based 
on the explicit Stormer-Verlet scheme and the Gauss methods of orders 8 and 12) 






Euclidean norm of the error at the end of the integration interval as a function of 
total numbers of function evaluations required for the integration; the pictures to the 
right present the same error as a function of the cpu times (with optimizing compiler 
on a SunBlade 100 workstation). We can draw the following conclusions from this 
experiment: 

• the implementation of composition methods based on the Stormer-Verlet scheme 
is extremely easy; that of implicit Runge-Kutta methods is slightly more involved 
because it requires a stopping criterion for the fixed-point iterations; 

• the overhead (total cpu time minus that used for the function evaluations) is much 
higher for the implicit Runge-Kutta methods; this is seen from the fact that im¬ 
plicit Runge-Kutta methods require less function evaluations for a given accu¬ 
racy, but often more cpu time; 

• among the two Gauss methods, the higher order method is more efficient for all 
precisions of practical interest; 
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• for very accurate computations (say, in quadruple precision), high order Runge- 
Kutta methods are more efficient than composition methods; 

• much of the computation in the Runge-Kutta code can be done in parallel (e.g., 
the 8 function evaluations of a fixed-point iteration); composition methods do not 
have this potential; 

• implicit Runge-Kutta methods can be applied to general (non-separable) differ¬ 
ential equations, and the cost of the implementation is at most twice as large; if 
one is obliged to use an implicit method as the basic method for composition, 
many advantages of composition methods are lost. 

Both classes of methods (composition and implicit Runge-Kutta) are of interest 
in the geometric integration of differential equations. Each one has its advantages 
and disadvantages. 

Fortran codes of these computations are available on the Internet under the 
homepage <http://www.unige.ch/math/folks/hairer>. A Matlab version of these 
codes is described in E. & M. Hairer (2003). 


VIII.7 Exercises 

1. Consider a one-step method applied to a Hamiltonian system. Give a proba¬ 
bilistic proof of the property that the error of the numerical Hamiltonian due to 
roundoff grows like 0{\fteps). 

2. Prove that the collocation polynomial can be written as 

w n (t) = y n + h^f3i(t) f(Y in ), 

i=1 

where the polynomials (3i(t ) are a solution of 

s 4-k 

3 = 1 

3. Apply your favourite code to the Kepler problem and to the outer solar system 
with data as in Fig. 6.2. Plot a work-precision diagram. 

Remark. Figure 7.1 shows our results obtained with the 8th order Runge-Kutta 
code Dop853 (Hairer, Nprsett & Wanner 1993) compared to an 8th order com¬ 
position method. Rounding errors are more pronounced for Dop853, because 
compensated summation is not applied. Computations on shorter time inter¬ 
vals and comparisons of required function evaluations would be more in favour 
for Dop853. It is also of interest to consider high order Runge-Kutta Nystrom 
methods. 

4. Consider starting approximations 

s s 

Y i° n = yn -2 + hY,tf$ > m,n- 2 ) + h ]T ^/(Yj'.n-l) (7.1) 

3= 1 3 =1 
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Fig. 7.1. Work-precision diagrams for the explicit, variable step size Runge-Kutta code 
Dop853 applied to two problems (Kepler and outer solar system). For a comparison, the 
results of Fig. 6.2 for the composition method are included 


which use the internal stages of two consecutive steps without any additional 
function evaluation. What are the conditions such that (7.1) is of order s + 1, of 
order 5 + 2? 

Compare the efficiency of these formulas with the algorithms (A) and (B) of 
Sect. VIII.6.1. 

5. Prove that for a second order differential equation y = f(y ) (more precisely, 
for y = z,z = f(y)) the application of the 5-stage Gauss method gives 

Vn + hy n + h 2 ^2 M 1 - c i)f(yn + Z in ) 

i= 1 
s 

i)n h ^ ^ bif ( y n + Zin)-) 
i =1 

where Zi n is obtained from the iteration (6.21). 

Hint. The coefficients of the Gauss methods satisfy V bjdji = 6^(1 — q) for 
all i. 


Hn +1 — 

Vn+1 = 





Chapter IX. 

Backward Error Analysis and Structure 
Preservation 


One of the greatest virtues of backward analysis ... is that when it is 
the appropriate form of analysis it tends to be very markedly superior 
to forward analysis. Invariably in such cases it has remarkable formal 
simplicity and gives deep insight into the stability (or lack of it) of the 
algorithm. (J.H. Wilkinson, IMA Bulletin 1986) 

The origin of backward error analysis dates back to the work of Wilkinson (1960) in 
numerical linear algebra. For the study of integration methods for ordinary differen¬ 
tial equations, its importance was seen much later. The present chapter is devoted to 
this theory. It is very useful, when the qualitative behaviour of numerical methods 
is of interest, and when statements over very long time intervals are needed. The 
formal analysis (construction of the modified equation, study of its properties) gives 
already a lot of insight into numerical methods. For a rigorous treatment, the modi¬ 
fied equation, which is a formal series in powers of the step size, has to be truncated. 
The error, induced by such a truncation, can be made exponentially small, and the 
results remain valid on exponentially long time intervals. 


IX. 1 Modified Differential Equation - Examples 


Consider an ordinary differen- 


tial equation 



<ft(yo) 


y = f(y), 





y = f(y) 




and a numerical method @h (y) 
which produces the approxi¬ 






^ h(Un ) 

mations 




2/o,2/i,2/2,... • 

y = fh(y) 





A forward error analysis consists of the study of the errors y\ — (ph(y o) (local error) 
and y n — (p n h(yo) (global error) in the solution space. The idea of backward error 
analysis is to search for a modified differential equation y = f h (y) of the form 


y = f(y ) + hf 2 {y) + h 2 f 3 (y ) + ..., 


(i.D 
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such that y n = y(nh), and in studying the difference of the vector fields f(y ) and 
fh{y ). This then gives much insight into the qualitative behaviour of the numerical 
solution and into the global error y n — y(nh ) = y(nh) — y(nh). We remark that 
the series in (1.1) usually diverges and that one has to truncate it suitably. The effect 
of such a truncation will be studied in Sect. IX.7. For the moment we content our¬ 
selves with a formal analysis without taking care of convergence issues. The idea 
of interpreting the numerical solution as the exact solution of a modified equation 
is common to many numerical analysts (“... This is possible since the map is the 
solution of some physical Hamiltonian problem which, in some sense, is close to the 
original problem”, Ruth (1983), or .. the symplectic integrator creates a numeri¬ 
cal Hamiltonian system that is close to the original..Gladman, Duncan & Candy 
1991). A systematic study started with the work of Griffiths & Sanz-Serna (1986), 
Feng (1991), Sanz-Serna (1992), Yoshida (1993), Eirola (1993), Fiedler & Scheurle 
(1996), and many others. 

For the computation of the modified equation (1.1) we put y := y(t) for a fixed t, 
and we expand the solution of (1.1) into a Taylor series 

y(t + h) = y + h(f(y) + hf 2 (y) + h 2 f 3 (y) + ...) 

+ 7 jj(f(y) + h&{y) + ...) (f(y) + hf 2 (y) + . 

We assume that the numerical method @h(y) can be expanded as 

&h(y) = y + hf{y) + h 2 d 2 (y) + h 3 d 3 (y ) + ... (1.3) 

(the coefficient of h is f(y) for consistent methods). The functions dj(y) are known 
and are typically composed of f(y) and its derivatives. For the explicit Euler method 
we simply have dj(y) = 0 for all j > 2. In order to get y(nh) = y n for all n, we 
must have y(t + h) = @h(y)- Comparing like powers of h in the expressions (1.2) 
and (1.3) yields recurrence relations for the functions fj(y), namely, 

f2{y) = d 2 (y) - ^ f'f(y) (1.4) 

h{y) = d 3 (y) - (/"(/, f)(y) + f'ffiy)^ - ^(ff 2 (y) + /^/(y))- 


Example 1.1. Consider the scalar differential equation 

y = y 2 , i/(o) = l (1.5) 

with exact solution y(t) = 1/(1 — t). It has a singularity at t = 1. We apply the 
explicit Euler method y n +i = y n + hf(y n ) with step size h = 0.02. The picture in 
Fig. 1.1 presents the exact solution (dashed curve) together with the numerical so¬ 
lution (bullets). The above procedure for the computation of the modified equation, 
implemented as a Maple program (see Hairer & Lubich 2000) gives 
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Fig. 1.1. Solutions of the modified equation for the problem (1.5) 


> fen := y -> y ~ 2 : 

> nn := 6: 

> fcoe[l] := fcn(y): 

> for n from 2 by 1 to nn do 

> modeq := sum(h~j*fcoe[j+1 ], j=0..n-2): 

> diffy[0] := y: 

> for i from 1 by 1 to n do 

> diffy[i] := diff(diffy[i-1] , y)*modeq: 

> od: 

> ytilde := sum(h"k*diffy[k]/k!, k=0..n): 

> res := ytilde-y-h*fcn(y): 

> tay := convert(series(res,h=0,n+l),polynom): 

> fcoe[n] := -coeff(tay , h,n): 

> od: 

> simplify(sum(h~j*fcoe[j+1 ], j=0..nn-l)); 


Its output is 

V = V 2 - hy 3 + h 2 | y 4 - ft . 3 | y 5 + h 4 y y' 6 - ft . 5 ^ y 7 ± ... . (1.6) 

The above picture also presents the solution of the modified equation, when trun¬ 
cated after 1,2,3, and 4 terms. We observe an excellent agreement of the numerical 
solution with the exact solution of the modified equation. 

A similar program for the implicit midpoint rule (1.1.7) computes the modified 
equation 


y = r+h^r+h^r+h«^r+h»-^y™±... 


192 


128 ' 


(1.7) 


and for the classical Runge-Kutta method of order 4 (left tableau of (II. 1.8)) 


~ ~2 7 4 1 , 7 6 65 —« 7 7 17 , 7 s 19 ~ 

y=y ~ h +h ~ h ~- v + h 


10 


576 


96 ' 


144 ' 


zb... . 


( 1 . 8 ) 


We observe that the perturbation terms in the modified equation are of size 
0(h p ), where p is the order of the method. This is true in general. 
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Theorem 1.2. Suppose that the method y n +1 = ^h(Pn) is of order p, i.e., 

&h(y) = Vh{y) + h p+1 S p+1 (y) + 0(h p+2 ), 

where pt{y) denotes the exact flow ofy = f(y), and h p + 1 5 p +i(y) the leading term 
of the local truncation error. The modified equation then satisfies 

V = f(y) + h p f p+1 (y) + h p+1 f p+2 (y) + ..., y{0) = y 0 (1.9) 

with f p+ i(y ) = <)p+i(y). 

Proof The construction of the functions fj ( y ) (see the beginning of this section) 
shows that fj(y) = 0 for 2 < j < p if and only if $h{y) ~ Th(y) = Ofh p+1 ). □ 

A first application of the modified equation (1.1) is the existence of an asymp¬ 
totic expansion of the global error. Indeed, by the nonlinear variation of constants 
formula, the difference between its solution y(t) and the solution y(t) of y = f(y) 
satisfies 

y(t) - y(t) = h p e p (t ) + h p+1 e p+1 (t ) + ... . (1.10) 

Since y n m y(nh) + 0{h N ) for the solution of a truncated modified equation, this 
proves the existence of an asymptotic expansion in powers of h for the global error 

y n - y(nh). 

A large part of this chapter studies properties of the modified differential equa¬ 
tion, and the question of the extent to which structures (such as conservation of 
invariants, Hamiltonian structure) in the problem y = f(y) can carry over to the 
modified equation. 

Example 1.3. We next consider the Lotka-Volterra equations 

q = q(p-l), p — p(2 q), 

and we apply (a) the explicit Euler method, and (b) the symplectic Euler method, 
both with constant step size h = 0.1. The first terms of their modified equations are 

(a) q = q(p- 1) - ^q(p 2 - pq + 1) + 0 (h 2 ), 

p = -p( q - 2) - ^ p(q 2 - pq - 3q + 4) + 0(h 2 ), 

(b) q = q(p~ 1) - \ q{p 2 +pq~*ip+l) + 0{h 2 ), 

P = ~P(q ~ 2) + \ p{q 2 +pq-5q + 4) + 0(h 2 ). 

Figure 1.2 shows the numerical solutions for initial values indicated by a thick dot. 
In the pictures to the left they are embedded in the exact flow of the differential 
equation, whereas in those to the right they are embedded in the flow of the modi¬ 
fied differential equation, truncated after the h 2 terms. As in the first example, we 
observe an excellent agreement of the numerical solution with the exact solution of 
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(a) explicit Euler, h — 0.1 



(b) symplectic Euler, h — 0.1 



Fig. 1.2. Numerical solution compared to the exact and modified flows 



Fig. 1.3. Study of the truncation in the modified equation 


the modified equation. For the symplectic Euler method, the solutions of the trun¬ 
cated modified equation are periodic, as is the case for the unperturbed problem 
(Exercise 5). 

In Fig. 1.3 we present the numerical solution and the exact solution of the mod¬ 
ified equation, once truncated after the h terms (dashed-dotted), and once truncated 
after the h 2 terms (dotted). The exact solution of the problem is included as a solid 
curve. This shows that taking more terms in the modified equation usually improves 
the agreement of its solution with the numerical approximation of the method. 
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Example 1.4. For a linear differential equation with constant coefficients 

y = A y, 2/(o) = yo 

we consider numerical methods which yield y n +i = R(hA)y n , where R(z) is the 
stability function (VI.4.9) of the method. In this case we get y n = R(hA) n yo , so 
that y n = y(nh ), where 5/(t) = R(hAy/ h yo = exp(|- In R(hA)^yo is the solution 
of the modified differential equation 

V = \ In R(hA) y = (A + hb 2 A 2 + h 2 b 3 A 3 + ...) y ( 1 . 11 ) 

with suitable constants b 2 ,63 ,.... Since R(z) = 1 +z+(D(z 2 ) andln(l-bx) = x — 
x 2 /2+0 ( x 3 ) both have a positive radius of convergence, the series ( 1 . 11 ) converges 
for \h\ < ho with some ho > 0. We shall see later that this is an exceptional 
situation. In general, the modified equation is a formal divergent series. 


IX.2 Modified Equations of Symmetric Methods 

In this and the following sections we investigate how the structure of the differential 
equation and geometric properties of the method are reflected in the modified differ¬ 
ential equation. Here we begin by studying this question for symmetric/reversible 
methods. 

Consider a numerical method $h- Recall that its adjoint y n +i = ^hil/n) is 
defined by the relation y n = $_h(y n + 1 ) (see Definition II. 1.4). 

Theorem 2.1 (Adjoint Methods). Let fj(y) be the coefficient functions of the 
modified equation for the method <L>h- Then, the coefficient functions f*(y ) of the 
modified equation for the adjoint method <T>* h satisfy 

fj(y) = (2.i) 

Proof The solution y(t) of the modified equation for ^ has to satisfy y{t) = 
$-h ( y(t + h)) or, equivalently, y(t — h) = $_h(y) with y := y(t). We get (2.1) if 
we replace h with —h in the formulas (1.1), (1.2) and (1.3). □ 

For symmetric methods we have ^ = $ h , implying f*(y ) = fj(y). We there¬ 
fore get the following corollary to Theorem 2.1. 

Theorem 2.2 (Symmetric Methods). The coefficient functions of the modified 
equation of a symmetric method satisfy fj(y) = 0 whenever j is even, so that ( 1 . 1 ) 
has an expansion in even powers of h. □ 

This theorem explains the h 2 -expansion in the modified equation (1.7) of the 
midpoint rule. 

As a consequence of Theorem 2.2, the asymptotic expansion (1.10) of the global 
error is also in even powers of h. This property is responsible for the success of h 2 - 
extrapolation methods. 
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Consider now a numerical method applied to a p-reversible differential equation 
as studied in Sect. V.l. Recall from Theorem V.1.5 that a symmetric method is p- 
reversible under the ^-compatibility condition (V.l.4), which is satisfied for most 
numerical methods. 

Theorem 2.3 (Reversible Methods). Consider a p-reversible differential equation 
V — f(y ) a nd cl p-reversible numerical method @h(y)- Then, every truncation of the 
modified differential equation is again p-reversible. 

Proof Let fj(y) be the jth coefficient of the modified equation (1.1) for The 
proof is by induction on j. So assume that for j = 1,..., r, the vector field fj (y) is 
p-reversible, i.e., 

p°fj = - fj ° p- 

We show that the same relation holds also for j = r + 1. By assumption, the trun¬ 
cated modified equation 

V = f(y ) + hf 2 {y) + ... + h r ~ l f r (y) 

is p-reversible, so that by (V.l.2), it has a p-reversible flow p r ffy), that is, po<p r?t = 
(p~l o p. By construction of the modified equation, we have 

My) = <Pr,h(y) + h r+ 1 f r+ i(y ) + o(h r+2 )- 

Since (f r ,h{y ) = y + 0(h), this implies 

Q-pty) = <pp h (y) ~ hr+1 fr+i(y) + o(h r+2 )- 

Since both <T>h and (p r ^ are p-reversible maps, these two relations yield p o / r+1 = 
—f r +1 o p as desired. □ 


IX.3 Modified Equations of Symplectic Methods 

We now present one of the most important results of this chapter. We consider a 
Hamiltonian system y = J~ 1 'VH(y ) with an infinitely differentiable Hamiltonian 
H(y), and we show that the modified equation of symplectic methods is also Hamil¬ 
tonian. 


IX.3.1 Existence of a Local Modified Hamiltonian 

... if we neglect convergence questions then one can always find a formal 
integral... (J. Moser 1968) 

Theorem 3.1. If a symplectic method T>ffy) is applied to a Hamiltonian system 
with a smooth Hamiltonian H : M? d —> M, then the modified equation (1.1) is 
also Hamiltonian. More precisely, there exist smooth functions Hj : M 2d —> M for 
j = 2,3 ,..such that fj(y) = J~ 1 'VHj(y). 
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The following proof by induction, whose ideas can be traced back to Moser 
(1968), was given by Benettin & Giorgilli (1994) and Tang (1994). It can be ex¬ 
tended to many other situations. We have already encountered its reversible version 
in the proof of Theorem 2.3. 

Proof. Assume that fj(y) = J~ 1 'VHj(y) for j = 1, 2,..., r (this is satisfied for 
r = 1, because fi(y) = f(y) = J~ 1 \/H(y)). We have to prove the existence of a 
Hamiltonian H r +i(y). The idea is to consider the truncated modified equation 

V = f(y ) + hf 2 {y) + ... + h r ~ 1 f r (y), (3.1) 

which is a Hamiltonian system with Hamiltonian H(y)+hH 2 {y) J r .. .+h r ~ 1 H r (y). 
Its flow ip r ,t(yo), compared to that of (1.1), satisfies 

&h(yo) = <Pr,h(yo) + h r+1 f r+1 (yo) + 0(h r+2 ), 


and also 

&h(Vo) = <P'r,h(,yo) + h r+1 fr +1 {yo) + 0(h r+2 ). 

By our assumption on the method and by the induction hypothesis, <£>h and ip r ^ 
are symplectic transformations. This, together with ip f rh (yo) = I + 0(h), therefore 
implies 

J = ^(yofJ&M = J + h r+1 f + 1 (y 0 ) T J + Jf r+ 1 ( 2 / 0 )) + 0(h r+2 ). 

Consequently, the matrix Jf' r+ 1 (y) is symmetric and the existence of H r +i(y) sat¬ 
isfying f r +i(y) = 1 V-ff r +i (y) follows from the Integrability Lemma VI.2.7. 

This part of the proof is similar to that of Theorem VI.2.6. □ 

For Hamiltonians H : D —» M the statement of the above theorem remains valid 
with Hj : D —> M on domains D C M 2d on which the Integrability Lemma VI.2.7 
is applicable. This is the case for simply connected domains D , but not in general 
(see the discussion after the proof of Lemma VI.2.7). 

IX.3.2 Existence of a Global Modified Hamiltonian 

By Lemma VI.5.3 every symplectic one-step method <£>h : (p, q) ^ (P, Q) can be 
locally expressed in terms of a generating function 5(P, g, h) as 

P}Q 

p = p + d^ {p ’ q ’ h) ’ Q = Q’ h )• (3 - 2) 

This property allows us to give an independent proof of Theorem 3.1 and in addition 
to show that the modified equation is Hamiltonian with H (p, q) defined on the same 
domain as the generating function. The following result is mentioned in Benettin & 
Giorgilli (1994) and in the thesis of Murua (1994), p. 100. 
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Theorem 3.2. Assume that the symplectic method <T>h has a generating function 

S(P,q,h ) = hS 1 (P,q) + h 2 S 2 (P,q) + h 3 S 3 (P,q) + ... (3.3) 

with smooth Sj (P, q) defined on an open set D. Then, the modified differential equa¬ 
tion is a Hamiltonian system with 

H(p,q) = H(p,q) + hH 2 (p,q) + h 2 H 3 (p,q) + ... , (3.4) 

where the functions Hj(p , q) are defined and smooth on the whole of D. 

Proof By Theorem VI.5.7, the exact solution (P, Q ) = (p(£), g(£)) of the Hamil¬ 
tonian system corresponding to H ( p , q) is given by 


P = P + Q = q + ^(P,q,t), 


where S is the solution of the Hamilton-Jacobi differential equation 


p) S — / h 5 \ — 

— (P, q, t) = H (P, q + — (P, q,t )), S(P, q, 0) = 0. (3.5) 

Since H depends on the parameter h , this is also thej^ase for S. Our aim is to 
determine the functions Hj (p, q) such that the solution P(P, g, t ) of (3.5) coincides 
for t = h with (3.3)._ 

We first express P(P, g, t ) as a series 

S(P,q,t) = tS 1 (P,q,h)+t 2 S 2 (P,q,h) + t 3 S 3 (P,q,h) + ... , 

insert it into (3.5) and compare powers of t. Thisallows us to obtain the functions 
Sj(p, g, h) recursively in terms of derivatives of H : 


Si(p,q,h) = H(p,q) 


2 S 2 (p,q, h) 

3 S 3 (p,q, h ) 


/ c)H dSi 

V dq dP 

/ dH dS2 

V P q ' ~dP 


) ([p, q, h) 

) ( p , q, h) 



(3.6) 


d 2 H /dSi dS r 
~df\~dP ’ ~dP 



We then write Sj as a series 


Sj(p, q, h) = Sji(p, q) +hS j2 (p, q) + h 2 S j3 (p, q) + ... , 


insert it and the expansion (3.4) for H into (3.6), and compare powers of h. This 
yields Sik(p, g) = Hk{p ? q) and for j > 1 we see that Sjk(p, g) is a function of 
derivatives of Hi with l < k. 
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The requirement S(p,q,h) = S(p,q,h) finally shows Si(p,q) = Su(p,q), 
S 2 (p,q) = S 12 (p,q) + S 2 i(p,q), etc., so that 


Sj (p, q) = Hj(p , q) + “function of derivatives of H k (p, q) with k < f\ 

For a given generating function S(P, q, h ), this recurrence relation allows us to de¬ 
termine successively the Hj (p, q) . We see from these explicit formulas that the func¬ 
tions Hj are defined on the same domain as the Sj . □ 

As a consequence of Theorem 3.2 and Theorems VI.5.4 and VI.5.5 we obtain 
the following result. 


Theorem 3.3. A symplectic (partitioned) Runge-Kutta method applied to a system 
with smooth Hamiltonian H : D —> M (with D C M? d an arbitrary open set) has a 
modified Hamiltonian (3.4) with smooth functions Hj : D —> R. □ 


Example 3.4 (Symplectic Euler Method). The symplectic Euler method is noth¬ 
ing other than (3.2) with S(P,q,h) = hH(P,q). We therefore have (3.3) with 
Si (p, q) = H(p, q) and Sj (p, q) = 0 for j > 1. Following the constructive proof of 
Theorem 3.2 we obtain 

H = H- \n p H q + ^( H pp H 2 q + H qq H 2 p + 4 +... . (3.7) 


as the modified Hamiltonian of the symplectic Euler method. For vector-valued p 
and q , the expression H p H q is the scalar product of the vectors H p and H q , and 
H pp Hq = H pp (H q , H q ) with the second derivative interpreted as a bilinear map¬ 
ping. 

As a particular example consider the pendulum problem (1.1.13), which is 
Hamiltonian with H(p,q) = p 2 /2 — cos q , and apply the symplectic Euler method. 
By (3.7), the modified Hamiltonian is 


H(p,q ) = H(p,q) 


h 

— p srn q 


h?_ 

12 


^sin 2 q + p 2 cos q 



This example illustrates that the modified equation corresponding to a separable 
Hamiltonian (i.e., H (p, q) = T(p) + U(q)) is in general not separable. Moreover, 
it shows that the modified equation of a second order differential equation q = 
—'VU(q) (or equivalently, q = p,p = —VU(q)) is in general not a second order 
equation. 


In principle, the constructive proof of Theorem 3.2 allows us to explicitly com¬ 
pute the modified equation of every symplectic (partitioned) Runge-Kutta method. 
In Sect. IX.9.3 below we shall, however, give explicit formulas for the modified 
Hamiltonian in terms of trees. This also yields an alternative proof of Theorem 3.3. 
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IX.3.3 Poisson Integrators 

Consider a Poisson system, i.e., a differential equation 

V = B(y)VH(y ), (3.8) 

where the structure matrix B(y ) satisfies the conditions of Lemma VII.2.3, and 
apply a Poisson integrator (Definition VII.4.6). 

Theorem 3.5. If a Poisson integrator @h(y) i s applied to the Poisson system (3.8), 
then the modified equation is locally a Poisson system. More precisely, for every 
yo G M n there exist a neighbourhood U and smooth functions Hj : U —> M such 
that on U, the modified equation is of the form 

V = B(y) (vH(y) + hVH 2 (y) + h 2 VH 3 (y ) + ...). (3.9) 

Proof. We use the local change of coordinates (u, c ) = x(y) °f the Darboux-Lie 
Theorem. By Corollary VII.3.6, this transforms (3.8) to 

u = J _1 V u K(u, c), 6 = 0, 

where K(u, c) = H(y) and V u is the gradient with respect to u. The same transfor¬ 
mation takes $h(y) to c) = c), c), whereby Lemma VII.4.10 

u i—> (u, c) is a symplectic transformation for every c. By Theorem 3.1, the mod¬ 
ified equation in the (u, c ) variables is of the form 

u = J~ 1 V tl K(u,c), cm 0 

with K(u, c ) = K(u, c) + h K 2 (u , c) + h?K^(u^ c) + ... . Transforming back to 
the ^/-variables gives the modified equation (3.9) with Hj (y) = Kj (u, c). □ 

The above result is purely local in that it relies on the local transformation of the 
Darboux-Lie Theorem. It can be made more global under additional conditions on 
the differential equation. 

Theorem 3.6. If H(y) and B(y) are defined and smooth on a simply connected 
domain D, and if B(y) is invertible on D, then a Poisson integrator @h(y) h as a 
modified equation (3.9) with smooth functions Hj(y) defined on all of D. 

Proof. By the construction of Sect. IX. 1, the coefficient functions f 3 (y) of the mod¬ 
ified equation (1.1) are defined and smooth on D. Since B(y) is assumed invertible, 
there exist unique smooth functions gj (y) such that fj(y) = B(y)gj(y). It remains 
to show that gj(y) = VHj(y) for a function Hj(y ) defined on D. 

By the local result of Theorem 3.5, we know that for every yo G D there exist 
functions Hj(y) such that gj(y) = VHj(y) in a neighbourhood of yo. This implies 
that the Jacobian of gj(y ) is symmetric on D. The Integrability Lemma VI.2.7 thus 
proves the existence of functions Hj(y ) defined on all of D such that gj (y) = 



348 IX. Backward Error Analysis and Structure Preservation 


IX.4 Modified Equations of Splitting Methods 

For splitting methods applied to a differential equation 

y = f [1] (y) + (4.i) 

the modified differential equation is obtained directly with the calculus of Lie deriv¬ 
atives and the Baker-Campbell-Hausdorff formula. This approach is due to Yoshida 
(1993) who considered the case of separable Hamiltonian systems. 

First-Order Splitting. Consider the splitting method 

= ^ ] o^\ 

where ipff is the time-ft flow of y = f^(y). In terms of the Lie derivatives Di 
defined by Dig{y) = g'(y)f^ (y), this method becomes, using Lemma 111.5.1, 

<£>h = exp(ftZ9 2 ) exp(ftZ)i)Id, 

and with the BCH formula (IIL4.il), (III.4.12) this reads 

<Ph = exp(ftI9)Id 


with 


D = D 1 +D 2 + ±[D 2 ,D 1 ] + ^([D 2 ,[D 2 ,D 1 ]] + [D 1 ,[D 1 ,D 2 ]])+... . (4.2) 
It follows that <Ph is formally the exact time-ft flow of the modified equation 

y = f(y) with J=D Id. (4.3) 


This gives 

f(y) = f(y ) + hf 2 (y ) + h 2 f 3 (y) + ... 
with / = /W + /[ 2 1 and 


h = 
h = 


+/ Pi"(/[.]_ / i.] ) + _ /N" (/ pi,/M) - /M W‘i). 


Strang Splitting. For the symmetric splitting 


<2> 


[S] [1] [2] [1] 

-■<Phl2°<Ph °<Phl2 


the symmetric BCH formula (III.4.14), (III.4.15) yields 
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= exp(^Di) exp(/ lD 2 ) exp(^Z9i) Id = exp (hD^) Id 

with 

5 [sl = D 1 +D 2 + h*(~[D 1 ,[D 1 ,D 2 ]] + ±[D 2 ,[D 2 ,D 1 ]]) + ... . (4.4) 

Hence, is the formally exact flow of the modified equation 

y = /t s] (y) with /t s] = D [s] Id. (4.5) 

This gives 

F ] ( y ) = m + h 2 ff ] (y) + hyp (y) + ... 

with / = /M + and 

/f 1 = (i(/ |1| "(/ 121 ,/ 121 ) + /™7 I21 7 121 - / |2| "(/ [11 ,/ 121 ) - / [21 7 [11 '/ 121 ) 
-^(/ |21 "(/ |1| ./ |11 )+/ |2| 7 |,| 7 |,| -/ |11 "(/ |2| ,/ |11 )-/ |1| 7 |2| 7 i ‘ 1 )). 

The modified equations for general splitting methods (III.5.13) are obtained in the 
same way, using Lemma III.5.5. 

Hamiltonian Splittings. Consider a differential equation (4.1) where the vector 
fields /M (y) = J _1 ViT^ (t/) are Hamiltonian. Lemma VII.3.1 shows that the com¬ 
mutator of the Lie derivatives of two Hamiltonian vector fields is the Lie derivative 
of another Hamiltonian vector field which corresponds to the Poisson bracket of the 
two Hamiltonians: [Dp, Dq] = D^ G F y. This implies in particular that the modi¬ 
fied differential equations (4.3) and (4.5) are again Hamiltonian. For the first-order 
splitting, we thus get fj(y) = J~ 1 \/Hj(y), where by (4.2) and (4.3), 

Hi = 

Hs = ^({{F [1] ,F [2l },ff [2] } + {{ff [2l ,ff [11 },F [11 }), 

and for the Strang splitting, by (4.4) and (4.5), 

Hp = + ±{{hW,hW},hM}. 

The explicit expressions from the BCH-formula show that the modified Hamiltonian 
is defined on the same open set as the smooth Hamiltonians . 

For the splitting H(p , q) = T(p) + U(q) of a separable Hamiltonian, this ap¬ 
proach gives an alternative derivation of the modified equation (3.7) of the sym- 
plectic Euler method, and a simple construction of the modified equation of the 
Stormer-Verlet method (Yoshida 1993). Here, the formula simplifies to 

(-±U qq (T p ,T p ) + ±T pp (U q ,U q ))+... . (4.6) 


tf [s] = H + h 2 
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IX.5 Modified Equations of Methods on Manifolds 

We consider the relationship between numerical methods for differential equations 
on manifolds and the associated modified differential equations. We give appli¬ 
cations to the study of first integrals, constrained Hamiltonian systems, and Lie- 
Poisson integrators. 

IX.5.1 Methods on Manifolds and First Integrals 

Consider a differential equation on a smooth manifold A4, 

y = f(y ) with f(y ) e T y M , (5.1) 

with a smooth vector field f(y ) defined on M. 

Theorem 5.1. Let <L>h : M. —> J\A be an integrator on the manifold M, with $h{y) 
depending smoothly on (y, h). Then, there exists a modified differential equation 
on M., 

y = f(y) + hf 2 (y) + h 2 f 3 (y) + ... (5.2) 

with smooth fj(y) E T y M, such that p r ,h{y) = @h{y) + 0(h r+1 ), where p r ,t(y) 
denotes the flow of the truncation of (5.2) after r terms. 

For symmetric methods, the expansion (5.2) contains only even powers of h. 

Proof. We choose a local parametrization y = x( z ) °f the manifold M. In the 
coordinates 2 the differential equation (5.1) reads 

i = F(z) with F(z) defined by x'{ z )F(z) = /(x(^)), 

and the numerical integrator becomes 

&h(z) = x _1 °^ o x(4 

Since F(z) and Fh( z ) are smooth, the standard backward error analysis on M n of 
Sect. IX. 1 yields a modified equation for the integrator Fh(z), 

z = F(z) + hF 2 (z) + h 2 F 3 (z) + ... . 


Defining 

fj{y ) = x'(z) Fj(z :) for y = X (z) 

gives the desired vector fields fj(y) on M. It follows from the uniqueness of the 
modified equation in the parameter space that f) (y) is independent of the choice of 
the local parametrization. 

The additional statement on symmetric methods follows from Theorem 2.2, be¬ 
cause Fh is symmetric if and only if <T>h is symmetric. □ 

Under an analyticity assumption, the converse statement also holds. 
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Theorem 5.2. Let the integrator <L>h : U — ► M n (with open U C M n ) be real 
analytic in h, and let M. = {y G U ; g(?/) = 0} with real analytic g : U M m . 
If the coefficient functions fj(y) of the modified differential equation (5.2) satisfy 
g'(y)fj(y) = 0/or a// y G Ad, t/ien t/ie restriction ofd>h to M. defines an 

integrator on Ad, z.e., : Ad —» Ad. 

Proof By the assumption on fj(y ), the flow of the truncated modified equation 
satisfies g o p r ^ h (y) = 0 for all r > 1 and all ?/ G Ad. Since p r ,h(y) = + 

(9(/i r+1 ), we have g o = G(h r+1 ) for all r. The analyticity assumptions 

therefore imply g o $h(y) =0- □ 

Theorems 5.1 and 5.2 apply to many situations treated in Chap. IV. 

First Integrals. The following result was obtained by Gonzalez, Higham & Stuart 
(1999) and Reich (1999) with different arguments. 

Corollary 5.3. Consider a differential equation y = f(y) with a first integral I(y ), 
i.e., I'(y)f(y) = 0 for all y. If the numerical method preserves this first integral, 
then every truncation of the modified equation has I(y) as a first integral. 

Proof. This follows from Theorem 5.1 by considering y = f(y) as a differential 
equation on the manifold Ad = {y ; I(y) = Const}, for which the tangent space is 
T y M = {v; r(y)v = 0}. □ 

The following converse of Corollary 5.3 is a direct consequence of Theorem 5.2. 

Corollary 5.4. Consider a differential equation y = f(y) with a real-analytic first 
integral I(y). If the numerical method @h(y) I s rea l analytic in h, and if every 
truncation of the modified equation has I(y) as a first integral, then the numerical 
method preserves I(y) exactly, i.e., I (ph(y)) = Hv)f or a ^ V- n 

Projection Methods. Algorithm IV.4.2 defines a smooth mapping on the manifold 
if the direction of projection depends smoothly on the position. This is satisfied 
by orthogonal projection, but is not fulfilled if switching coordinate projections are 
used (as in Example 4.3). The symmetric orthogonal projection method of Algo¬ 
rithm V.4.1 gives a symmetric method on the manifold to which Theorem 5.1 can 
be applied. 

Methods Based on Local Coordinates. If the parametrization of the manifold em¬ 
ployed in Algorithms IV.5.3 and V.4.5 depends smoothly on the position, then again 
Theorem 5.1 applies. This is the case for the tangent space parametrization, but not 
for the generalized coordinate partitioning considered in Sect. IV.5.3. 

Corollary 5.5 (Lie Group Methods). Consider a differential equation on a matrix 
Lie group G, 

Y = A(Y)Y, 

where A(Y ) is in the associated Lie algebra 0. A Lie group integrator : G —> G 
has the modified equation 
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Y = (. A(Y) + hA 2 (Y) + h 2 A 3 (Y ) + .. .)Y (5.3) 

with Aj(Y ) G 0 forY G G. 

Proof. This is a direct consequence of Theorem 5.1 and (IV.6.3), viz., TyG = 
{AY\A G 0}. □ 

IX.5.2 Constrained Hamiltonian Systems 

In Sect. VII. 1 we studied symplectic numerical integrators for constrained Hamil¬ 
tonian systems 

q = H p (p, q) 

V = —H q (p, q) — G(q) T X (5.4) 

0 = g(q). 

Assuming the regularity condition (VII. 1.13), the Lagrange parameter A = A(p, q) 
is given by (VII. 1.12). This system can be interpreted as a differential equation on 
the manifold 

M = {( p,q ) | g{q) = 0, G(q)H p (p,q) = 0}, (5.5) 

where G(q) = g'(q ). The symplectic Euler method (VII. 1.19)-(VII. 1.20), the RAT¬ 
TLE scheme (VII. 1.26), and the Lobatto IIIA-IIIB pair (VII. 1.27)-(VII. 1.30) were 
found to be symplectic integrators <3>h on the manifold M. 

Theorem 5.6. A symplectic integrator <P>h : M. —> M for the constrained Hamil¬ 
tonian system (5.4) has a modified equation which is locally of the form 

q = H p (p,q) 

$ = -H q (p,q) - G(q) T X (5.6) 

0 = g(q), 

where \ = \(p,q) is given by (VII. 1.12) with H replaced by H, and 

H(P, q) = H(p, q) + h H 2 (p, q) + h 2 H 3 (p, q) + ... (5.7) 

with Hj(p , q) satisfying G(q)\7 p Hj(p , q) = 0 for (p, q) G M. and all j. 

Proof. As explained in Example VII.2.7, a local parametrization (p, q) = x{ z ) of 
the manifold M transforms (5.4) to the Poisson system 

i = B(z)YK(z) (5.8) 

with B(z) = {x'{ Z ) T Jx'{ z ))~ 1 an d K(z) = H(x{ z ))- Lemma VII.4.9 implies 
that the numerical method ^h{Pi q) on A4 becomes a Poisson integrator \Ph(z) f° r 
(5.8). By Theorem 3.5, \Ph(z) has the modified equation 

z = B(z)(vK(Z) + hX7K 2 (Z)4rh 2 VK 3 (Z) + ...). 


(5.9) 



IX.5 Modified Equations of Methods on Manifolds 353 


Let 7r be a smooth projection onto the manifold A4, defined on a neighbourhood of 
M in R 2d . We then define 


Hj(p,q) = iq(x"V(p,9))) + v(p,q) T G(q)V p H(p,q) 

where we choose p(p, q) such that 

G(q)X7 p Hj(p,q) = 0 for (p,q) £ M. (5.10) 

This is possible because of the regularity assumption (VII. 1.13), and because 
G(q)V p H(p, q) = 0 on M. The condition (5.10) implies that the system (5.6) can 
be viewed as a differential equation on the original manifold M. Using the same 
parametrization (p, q) = x( z ) as before shows that (5.6) is equivalent to (5.9). □ 

We note that, due to the arbitrary choice of the projection i r, the functions 
Hj (p, q) of the modified equation are uniquely defined only on M. 

Global Modified Hamiltonian. If we restrict our considerations to partitioned 
Runge-Kutta methods, it is possible to find Hj(p 1 q) in (5.7) that are globally de¬ 
fined on M. Such a result is proved by Reich (1996a) and by Hairer & Wanner 
(1996) for the constrained symplectic Euler method and the rattle algorithm, and 
by Hairer (2003) for general symplectic partitioned Runge-Kutta schemes. We fol¬ 
low the approach of the latter publication, but present the result only for the im¬ 
portant special case of the RATTLE algorithm (VII. 1.26). The construction of the 
Hj (p, q) is done in the following three steps. 

Step 1. Symplectic Extension of the Method to a Neighbourhood of the Manifold. 
The numerical solution (pi, qf) of (VII. 1.26) is well-defined only for initial values 
satisfying (po,Qo) £ AL However, if we replace the condition “ g(qi ) = 0” by 


9 {Qi) = g{Qo) + hG(qo)H p (po,q 0 ), (5.11) 

and the condition “G(gi)iT p (pi, qi) = 0” by 


G( qi )H p (p uqi ) = G(q 0 )H p (p 0 ,q 0 ), (5.12) 


then the numerical solution is well-defined for all (po, Qo) in an ^-independent open 
neighbourhood of M (cf. the existence and uniqueness proof of Sect. VII. 1.3). Un¬ 
fortunately, the so-obtained extension of (VII. 1.26) is not symplectic. 

Inspired by the formula of Lasagni for the generating function of (uncon¬ 
strained) symplectic Runge-Kutta methods (see Sect. VI.5.2), we let 


S(pi,qo,h) 


7 ; (ff(pi/ 2 ,®>) + H(p 1 / 2 ,qi) + g(q 0 ) T A + g{qi) T f (5.13) 

h 2 / \t , x 

— \H q (pi/2, qi) + G{qi) T p) [H p (pi/2,qo) + H p (p 1/2 , qi)J, 


where Po,Pi/ 2 ,Pi, Q(h #i, A, p are the values of the above extension. In the defini¬ 
tion (5.13) of the generating function we consider P 05 .P 1/25 Qi A? I 1 as functions of 
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(pi, go), what is possible because pi = po + 0(ft). With the help of S(p, q , ft) we 
define a new numerical method on a neighbourhood of At by 

P0—P1+ S q (pi,q 0 ,h), q% = q 0 + S p (p 1 ,q 0 ,h). (5.14) 

This method is symplectic by definition, and it also coincides with the RATTLE 
algorithm on the manifold At. Using the fact that the last expression in (5.13) equals 
(pi — Pi/ 2 ) T {qi — qo), this is seen by the same computation as in the proof of 
Theorem VI.5.4. 

Step 2. Application of the Results of Sect. 1X3.2. The function S(pi , go, ft) of (5.13) 
can be expanded into powers of ft with coefficients depending on (pi, go)- These 
coefficient functions are composed of derivatives of 77(p, q ) and g(q) and, conse¬ 
quently, they are globally defined. For example, the ft-coefficient is 

Si(pi,qo) = H(p l ,q 0 ) + g{q 0 ) T \{pi,q 0 ), (5.15) 

where A(p, q) is the function defined in (VII. 1.12). 

We are thus exactly in the situation, where we can apply Theorem 3.2. This 
proves that the method (5.14) has a modified differential equation with globally 
defined modified Hamiltonian 


H ex t(p , q) = #i(p, q) + hH 2 (p , q) + ... . (5.16) 

In particular, the constructive proof of Theorem 3.2 shows that Hi (p, q) = S\ (p, q) 
with Si(p, q) from (5.15). 

Step 3. Backinterpretation for the Method on the Manifold. Since the RATTLE al¬ 
gorithm defines a one-step method on M, it follows from Theorem 5.1 that every 
truncation of the modified differential equation 


p= -V q H ext (p,q), q = ^ P H ext (p,q) (5.17) 

is a differential equation on the manifold M. Terms of the form p(g) T p(p, q) in 
H ex t(p , q), which vanish on Ai, give rise to — g{q) T p q {p , q) — G(g) T p(p, q) and 
p(g) T p p (p, q) in the vector field of (5.17). On the manifold At, where g(q) = 
0, only the expression —G(q) T /j J (p 1 q) remains. Consequently, we can arbitrarily 
remove terms of the form g(q) T p(p,q) from the functions Hj(p,q) in (5.16), if 
we add a term —G(q) T X in the differential equation for p with A defined by the 
relation g(q) = 0. This then gives a problem of the form (5.6) with globally defined 
Hj(p, q). 


IX.5.3 Lie-Poisson Integrators 

As in Sect. VII.5.5 we consider a symplectic integrator 

(P 1 ,Qi) = ^(P 0 ,Q 0 ) on T*G 
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for the left-invariant Hamiltonian system (VII.5.43) on a matrix Lie group G with a 
Hamiltonian H(P,Q) that is quadratic in P. We suppose that the method preserves 
the left-invariance (VII.5.54) so that it induces a one-step map 

Yi = P h (Y 0 ) on 0* 

by setting Yi = Q\Pi for (Pi, Qi) = P/,( P 0 . Q 0 ) with QqPo = Y 0 . This is a 
numerical integrator for the differential equation (VII.5.37) on 0*, and in the coor¬ 
dinates y = (y :j ) with respect to the basis (F :) ) of 0* this gives a map 

yi = i>h(y o) on R d , 

which is a numerical integrator for the Lie-Poisson system y = B(y)\7H(y) with 
B(y) given by (VII.5.35). 

Theorem 5.7. If<Ph (-P, Q ) is a symplectic and left-invariant integrator for (VII.5.43) 
which is real analytic in h, then its reduction fh(y) Is a Poisson integrator. More¬ 
over, &h(Y) preserves the coadjoint orbits, i.e., h(Y ) G {Ad"f_ 1 Y ; U G G}. 

Proof (a) In the first step one shows, by the standard induction argument as in the 
proof of Theorem 2.3, that the modified equation given by Theorem 5.6, 

rn 

P = -V q H{P,Q)** YAVqgiiQ), Q = V P H(P,Q) 

tt (5.18) 

0 = 9i(Q), i = 

with 

H(P, Q) = H(P, Q) + hH 2 (P, Q) + h 2 H 3 (P, Q) + ... 
is left-invariant, i.e., 

Hj{U T P , U- X Q) = Hj(P, Q ) for all U G G and all j. (5.19) 

(b) The Lie-Poisson reduction of Theorem VII.5.8 yields that if (P(£), Q(t)) G 
T*G is a solution of the modified system (5.18), then Y(t) = Q(t) T P(t ) G g* 
solves the differential equation 

(Y,X) = (Y, [H'(Y),X]) for all X G 0 . (5.20) 

Theorem VII.5.6 shows that its solution lies on a coadjoint orbit. By Theorem VII.5.5, 
(5.20) is equivalent to the Poisson system 

y*B(y)VH(y). (5.21) 

(c) We know already from Theorem VII.5.11 that f>h{y) is a Poisson map. Since 

all truncations of the modified equation (5.21) have the Casimirs as first integrals, 
their preservation by follows from Corollary 5.4. Similarly, the preservation of 
the coadjoint orbits follows from Theorem 5.2. □ 

In contrast to Theorem 3.5, we here obtain a global modified Hamiltonian in 
the modified Poisson system if the method is obtained by the discrete Lie-Poisson 
reduction of the RATTLE algorithm; see the preceding subsection. 
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IX.6 Modified Equations for Variable Step Sizes 

The modified differential equation of a numerical integrator depends on the step 
size employed. Therefore, if the step size is changed arbitrarily, a different modified 
equation occurs at every step. This is the reason for the poor longtime behaviour 
observed in Sect. VIII. 1. On the other hand, a satisfactory backward error analysis 
is possible for the variable-step approaches of Sects. VIII.2 and VIII.3. 

Time Transformations. The adaptive approaches of Sect. VIII.2 amount to apply¬ 
ing a fixed step size method to a transformed differential equation. Hence, the back¬ 
ward error analysis considered so far applies directly and yields modified equations 
for the transformed problem. These modified equations are Hamiltonian for Algo¬ 
rithm VIII.2.1 and reversible for method (VIII.2.12). 

Proportional, Reversible Step Size Controllers. As in Sect. VIII.3.1 we let the 

step size be of the form 

K+i /2 = £s(y n ,e), (6.1) 

where £ is a small accuracy parameter. It is not allowed to use information from 
previous steps. The idea is to work with expansions in powers of the fixed parameter 
5 instead of the step sizes, and to consider the exact solution of the modified equation 
on a variable grid. The following development is given in Hairer & Stoffer (1997). 
It extends the results of Sects. IX. 1 and IX.2 to variable step sizes. 

Theorem 6.1. Let @h(y) be a smooth one-step method. 

a) The variable-step method y $ £ s ( y , £ ) (y) has a modified differential equa¬ 


tion 

y = f{y)+ef 2 {y)+e 2 h{y) + --- , (6.2) 

with smooth vector fields fj(y), such that 

<Pr,e s{y,e) (V ) = s(y,e) (y) + 0(£ r+1 ) , (6.3) 

where p r ,t(y) denotes the flow of the truncation of (6.2) after r terms. 

b) If the method is symmetric (i.e., <L>h(y ) = and s(y , —e) = s(y,s) 

holds with y = <L £S ( y ^ £ )(y), then the expansion (6.2) is in even powers ofe, i.e., 

fj(y) = 0 for even j. (6.4) 

c) If the method is p-reversible (i.e., po<T> h = (p” 1 o p) and s(p~ 1 y , e) = s(y, e) 

holds with y = £S (y,£){y)> then the modified equation (6.2) is p-reversible, i.e., 

P° fj = ~fj ° P for all j. (6.5) 


Proof, a) The modified equation (6.2) is constructed by Taylor expansion of (6.3) 
in the same way as (1.1), using ^-expansions instead of /^-expansions. 

For the proof of the statements (b) and (c) we denote, as we did in Sect. VIII.3, 
V £ (y)=$ £S (y,e) (y)- We then compute the dominant error term in (6.3) and obtain 
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#e(y) = <Pr,es(y, S )(y) + £ r+1 s(y,e)fr+l(y) + 0(e r+2 ). (6.6) 

With the aim of getting an analogous formula for <7 r ~ 1 , we put y = \P £ (y) and use 
<Pr,t(y) = Vr,-t{y) SO that 

y = <p r ,-es(y,e)(y- £ r+1 s(t/, s) f r +l(y) + 0(s r+2 )). (6.7) 

b) Inserting s(y, e) = s(y, —e) into (6.7) and using the facts that y = y + 0(e) 
and that the derivative <p' rt (y) is 0(t )~close to the identity, we obtain 

K 1( S) = y = Vr-ss{y-s){y) - £ r+1 s(y,0)/r+l(y) + (D(e r+2 ). (6.8) 

By (VIII.3.3) we have \P £ = &I £ - Changing the sign of 5 in (6.8), a comparison 
with (6.6) proves that f r +i(y) = (-l) r f r +i(y) implying (6.4). 

c) With s(y, e) = s(p~ 1 y , e) formula (6.7) yields 

Vpiv) = Vr-es(p-'y,e){y) ~ £ r+ ts(/5 _1 y, 0)/ r+ i (£) + 0(e r+2 ). 

By an induction argument on r we assume that p o <p r?t = p r - t o p. The p- 
reversibility of \P £ , i.e., p o \p £ = iZ'T 1 o p, thus implies the statement (6.5). □ 

Integrating, Reversible Step Size Controllers. We next study a backward error 
analysis for Algorithm VIII.3.4. It is possible to interpret this algorithm as the fixed 
step size method of (VIII.3.19) applied to the augmented system (VIII.3.17) and 
to apply the construction of Sect. IX. 1. This approach has been taken in Hairer & 
Soderlind (2004). In view of an error analysis for reversible integrable systems it 
seems to be more convenient to consider the solution of the modified equation on a 
variable grid as it is done in Theorem 6.1. 

Let us recall Algorithm VIII.3.4. For a given basic integrator @h(y) and a given 
time transformation a(y) we denote G(y) = — ( a(y )) 1 Vcr(y) T f(y) and we com¬ 
pute for a given initial value yo and with zq = 1 /a(yo) 



z n+ 1/2 

— z n -\- s G(y n )/2 



yn+ 1 

= ^£/z n+1 /2 (yn) 

(6.9) 


z n+ 1 

= z n+i/2 + s G(y n +i)/2. 


The values \ 

y n approximate y(t n 

J, where t n+1 =t n + e/z n + 1/2 . 

We further use the 

notation 


»" +1 ) and ?=(£ ? 

z.+i/ V° 1 

)• (6-10) 


The step size used in this algorithm is 


K+i /2 = —-— =es(y n ,z n ,e ) with s(y,z,e)m —-— . (6.11) 

z n + 1/2 ^ + eG(y)/2 

The symmetric definition of the algorithm immediately yields 
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s(y, z,-e) = s(y,z,e) for (y,z) = & £ (y,z). (6.12) 

For a p-reversible differential equation y = f(y) and for a(y) satisfying a(p~ 1 y) = 
cr(y) we have G(p~ 1 y) = —G(y). Consequently, the step size function s(y, z, s ) of 
(6.11) also satisfies 

s(p~ 1 y,z, -e) = s(y,z, e) for (y,z) = & £ (y,z). (6.13) 

With this preparation we are able to formulate the following result. 

Theorem 6.2. Let @h(y) b e a smooth one-step method, a(y) a smooth time trans¬ 
formation, and s(y , z, e) the step size function of (6.11). 

a) For the method \P £ of (6.10) there exists a modified differential equation 

y = f(y) + zf 2 (y,z) + £ 2 f3{y,z) + --- 

z = z G(y) + sG^iy, z) + s 2 Gs{y, z) + ... , 
with smooth vector fields fj(y , z), Gj(y , z), such that 

s(y,z,e) (jL F £ (y^ z) ~\~ G(s ), 

where p r ,t(lh z ) denotes the flow of the truncation of the system (6.14) after r terms. 

b) If the basic method is symmetric (i.e., <4>h{y) = $Z\(y)) then 

fj(y) = 0 for even j. (6.16) 

c) If the basic method is p-reversible (i.e., po<P h = F^op) and a(p~ 1 y) = a(y) 
holds, then the modified equation (6.14) is p-reversible with p given by (6.10), i.e., 

pfj(y, z ) = - fj{py, z ), Gj(y,z) = —G(py, z) for all j. (6.17) 

Proof. The proof is the same as for Theorem 6.1 and therefore omitted. Notice that 
the step size function satisfies (6.12) and (6.13) which are needed in that proof. □ 

If the basic method is of order p then the coefficient functions of (6.14) satisfy 
fj(y , z) = 0 for j = 2, ... ,p. We always have G 2 (y, z) = 0 due to the symmetric 
way of choosing z n+1 / 2 in (6.9). However, Gs(y^z) 0 in general, even if the 
method <L>h has an order higher than two. 


(6.14) 

(6.15) 


IX.7 Rigorous Estimates - Local Error 

Wherefore it is highly desirable that it be clearly and rigorously shown 
why series of this kind, which at first converge very rapidly and then ever 
more slowly, and at length diverge more and more, nevertheless give a 
sum close to the true one if not too many terms are taken, and to what 
degree such a sum can safely be considered as exact. 

(a footnote in Gauss’ thesis, 1799) 

Up to now we have considered the modified equation (1.1) as a formal series without 
taking care of convergence issues. Here, 
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• we show that already in very simple situations the modified differential equation 
does not converge; 

• we give bounds on the coefficient functions fj(y) of the modified equation (1.1), 
so that an optimal truncation index can be determined; 

• we estimate the difference between the numerical solution y% = <Ph{yo) and the 
exact solution y{h) of the truncated modified equation. 

These estimates will be the basis for rigorous statements concerning the long-time 
behaviour of numerical solutions. The rigorous estimates of the present section have 
been given in the articles Benettin & Giorgilli (1994), Hairer & Lubich (1997) and 
Reich (1999). We mainly follow the approach of Benettin & Giorgilli, but we also 
use ideas of the other two papers. 

Example 7.1. We consider the differential equation 1 y = /(£), t/(0) = 0, and we 
apply the trapezoidal rule y\ — ft,(/(0) + f(h))/ 2. In this case, the numerical 
solution has an expansion ^(£, y) = y + h(f(t) + f(t + h))/2 = y + hf(t) + 
h 2 f'(t)/ 2 + h 3 f"(t)/ 4 + ..., so that the modified equation is necessarily of the 
form 

y = f(t) + hb\f\t) + h 2 b 2 f"(t) + h 3 b 3 f"'(t) + ... . (7.1) 

The real coefficients bk can be computed by putting f(t) = e t . The relation 
^(t, y) = y(t + h ) (with initial value y(t ) = y) yields after division by e l 

| (e h + l) - (l^ h h + b 2 h 2 + b 3 h 3 + .. (e 7 * - l) . 


This proves that b\ = 0, and bk = Bk/k \, where Bk are the Bernoulli numbers (see 
for example Hairer & Wanner (1997), Sect. II. 10). Since these numbers behave like 
Bk/k\ ~ Const • (27t) _/c for k —> oo, the series (7.1) diverges for all h ^ 0, as 
soon as the derivatives of f(t) grow like f^ k \t) ~ k\ MR ~ k . This is typically the 
case for analytic functions f(t) with finite poles. 

It is interesting to remark that the relation ^(£, y) = y(t + h) is nothing other 
than the Euler-MacLaurin summation formula. 

As a particular example we choose the function 


m = 


5 

1 + 25t 2 ' 


Figure 7.1 shows the numerical solution and the exact solution of the modified equa¬ 
tion truncated at different values of N. For h = 0.2, there is an excellent agreement 
for TV < 12, whereas oscillations begin to appear from N = 14 onwards. For the 
halved step size h = 0.1, the oscillations become visible for N twice as large. 

1 Observe that after adding the equation t — 1, t( 0) = 0, we get for Y = (t, y) T the 
autonomous differential equation Y — F(Y) with F(Y) — (1, /(t)) T . Hence, all results 
of this chapter are applicable. 
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-.3 0 * 

N < 32 


h = 0.1 



.3 

N = 34 


Fig. 7.1. Numerical solution with the trapezoidal rule compared to the solution of the trun¬ 
cated modified equation for h = 0.2 (upper four pictures), and for h = 0.1 (lower two 
pictures) 


The main ingredient of a rigorous backward error analysis is an analyticity as¬ 
sumption on the differential equation y = f(y) and on the method. Throughout this 
section we assume that f(y) is analytic in a complex neighbourhood of yo and that 

\\f(y)\\<M for \\y — i/o|| < 2R (7.2) 

i.e., for all y of T> 2 i?(j/o) : = {y C C d ; \\y — yo\\ < 2 R}. Our strategy is the 
following: using (7.2) and Cauchy’s estimates we derive bounds for the coefficient 
functions dj(y) of (1.3) on B R (yo) (Sect. IX.7.1), then we estimate the functions 
fj(y ) of the modified differential equation on B R / 2 {yo ) (Sect. IX.7.2), and finally 
we search for a suitable truncation for the formal series (1.1) and we prove the 
closeness of the numerical solution to the exact solution of the truncated modified 
equation (Sect. IX.7.3). 

IX.7.1 Estimation of the Derivatives of the Numerical Solution 

If we apply a numerical method to y = f(y) with analytic f(y ), the expression 
@h(y) will usually be analytic in a neighbourhood of h = 0 and y G B R (yo). 
Consequently, the coefficients dj (y) of the Taylor series expansion 

®h(y) = V + hf{y) + h 2 d 2 (y) + h 3 d 3 (y ) + ... 


(7.3) 
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are also analytic and the functions dj(y) can be estimated by the use of Cauchy’s 
inequalities. Let us demonstrate this for Runge-Kutta methods. 

Theorem 7.2. For a Runge-Kutta method (II. 1.4) let 

s s 

M = XN> K= max Y'|a ij |. (7.4) 

Z -* 7=1,...,S Z -' 

i=l j=l 

If f(y) is analytic in the complex ball B 2R (yf) and satisfies (7.2), then the coeffi¬ 
cient functions dj(y) of (7.3) are analytic in B R (yf) and satisfy 

/9 K /l f V -1 

\\dj(y)\\ < ^m(—— j for \\y-y 0 \\<R. (7.5) 


Proof. For y G B 3R / 2 (yo) and \\Ay\\ < 1 the function a(z) = f(y + zAy) is 
analytic for M < R/2 and bounded by M. Cauchy’s estimate therefore yields 

\\f'(y)Ay\\ = \\a'm\<2M/R. 


Consequently, ||/ / (^)|| < 2M/Rfor y G B 3R / 2 {yo) in the operator norm. 

For y G B R (yf), the Runge-Kutta method (II. 1.4) requires the solution of the 
nonlinear system gi = y + h Ylj=i a ijf(9j )» which can be solved by fixed point 
iteration. If \h\2nM/R < 7 < 1, it represents a contraction on the closed set 
{(<71,... ,g s ) ; ||gi — y\\ < R/2} and possesses a unique solution. Consequently, 
the method is analytic for \h\ < 7 Rj (2 nM) and y G B R (yf). This implies that the 
functions dj(y) of (7.3) are also analytic. Furthermore, \\<I>h(y) — y\\ < |/i|/xM for 
V C B R (y 0 ) so that, again by Cauchy’s estimate, 


\\dj(y)\\ = ^ 




-») 


/i=0 


< 




^(2 kM \ 


for j > 1. The statement is then obtained by considering the limit 7 —» 1. □ 

Due to the consistency condition = methods with positive weights 

hi all satisfy y = 1. The values /i, ft of some classes of Runge-Kutta methods are 
given in Table 7.1 (those for the Gauss methods and for the Lobatto IIIA methods 
have been checked for s < 9 and s < 5, respectively). 

Estimates of the type (7.5), possibly with a different interpretation of M and R, 
hold for all one-step methods which are analytic in h and y, e.g., partitioned Runge- 
Kutta methods, splitting and composition methods, projection methods, Lie group 
methods, .... 


Table 7.1. The constants y and n of formula (7.4) 


method 

y 

K, 

method 

y 

K, 

explicit Euler 

1 

0 

implicit Euler 

1 

1 

implicit midpoint 

1 

1/2 

trapezoidal rule 

1 

1 

Gauss methods 

1 

c s 

Lobatto IIIA 

1 

1 
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IX.7.2 Estimation of the Coefficients of the Modified Equation 

At the beginning of this chapter we gave an explicit formula for the first coefficient 
functions of the modified differential equation (see (1.4)). Using the Lie derivative 

(AflO(y) = g'(y)fi{y) (7.6) 

(cf. (VI.5.2)) and fi(y) := f(y ), these formulas can be written as 

h(y) = <k(v) ~ ^(Difi)(y) 

MV) = d 3 (y)-^(D 2 1 f 1 )(y)-^(D 2 f 1 +D 1 f 2 )(y). 

We have the following recurrence relation for the general case. 

Lemma 7.3. If the numerical method has an expansion of the form (7.3), then the 
functions fj(y) of the modified differential equation (1.1) satisfy 

j , 

fj(y) = dj(y) - J2 -j E ( D *< ■ • • (y), 

i =2 k\-\-...+ki=j 

where k m > 1 for all m. Observe that the right-hand expression only involves fk(y) 
with k < j. 

Proof. The solution of the modified equation (1.1) with initial value y(t) = y can 
be formally written as (cf. (1.2)) 

y(t + h) = + 

i> 1 

where F(y) = /i (y) + hfe ( y ) + h 2 fs (y) +... stands for the modified equation, and 
hD = hDi + h 2 D 2 + h s Ds + ... for the corresponding Lie operator. We expand 
the formal sums and obtain 

y(t + h) = y + E^i E h kl+ - +ki [p kl ...£> fci _i/ fei )(y), (7.7) 

i> 1 ki,...,ki 

where all k m > 1. Comparing like powers of h in (7.3) and (7.7) yields the desired 
recurrence relations for the functions fj (y). □ 

To get bounds for ||/j(^)||, we have to estimate repeatedly expressions like 
|| (Dig)(y) ||. The following variant of Cauchy’s estimate will be extremely useful. 

Lemma 7.4. For analytic functions fi(y) and g(y ) we have for 0 < cr < p the 
estimate 

\\Dig\l < • WfiWa -llyllp. 

Here, [| 5 [| p := max{|| 5 ( 2 /)|J; y G B p (y 0 )} and ||AyIU are defined simi- 

larly. 



IX.7 Rigorous Estimates - Local Error 363 


Proof. For a fixed y G B a (yo) the function a(z) = g(y + zfi(y )) is analytic for 
(Wl <£:-(/>- o)/M with M := \\fi\\ a . Since a' (0)1 = sf{y)f%{v) = (A$)(v), 
we get from Cauchy’s estimate that 

\\(Dig)(y)\\ = ||a'( 0 )|| < - sup ||a(z)|| < • \\g\\ p . 

e \z\< s P~o 

This proves the statement. □ 

We are now able to estimate the coefficients fj (y) of the modified differential 
equation. 

Theorem 7.5. Let f(y) be analytic in F? 2 i?(j/o)> let the Taylor series coefficients of 
the numerical method (7.3) be analytic in B^yf), and assume that (7.2) and (7.5) 
are satisfied. Then, we have for the coefficients of the modified differential equation 

\\fj(y)\\ < l n2 f° r l|y- 2 /o|| < -R/2, (7.8) 

where y = 2 max (ft, /i/( 2 ln 2 — 1 )). 

Proof. We fix an index, say J, and we estimate (in the notation of Lemma 7.4) 

\\fj\\R-(j-i)s f o r j = 1,2,,.., J, 

where 5 = R/(2(J — 1)). This will then lead to the desired estimate for \\fj\\ R / 2 - 
In the following we abbreviate || • ||by || • ||j. Using repeatedly Cauchy’s 
estimate of Lemma 7.4 we get for k\ + ... + ki = j that 

\\D kl ...D ki _ 1 fk i \\j < — \\fki \\j \\Bk 2 • • • Dk i _ 1 fk i \\j-& 

< ••• < ll/fcilli ll/fc 2 llj-i • • • • • WfkiWj-i+i 

< 11/feilUi ll/fc 2 llfe • • • • ’ II fkAkf 

The last inequality follows from \\g\\j < \\g\\i for l < j , which is an immediate 
consequence of B R _^_i^(yf) C B R _y_^s(yo)- It therefore follows from Lem¬ 
ma 7.3 that 


ii/iib- < - E ' E fz I \\f k A\k 1 \\fk 2 \\k 2 -...-\\f ki \\ ki . 


i\ z ' 5 l 

i=2 k\-\-...-\-ki=j 


By induction on j (1 < j < J) we obtain that \\fj\\j < 5/3j, where (3j is defined by 

j -1 3 


& = 


yM f 2 ftM V 1 0 a 

— \-R) + S S £ A. A: 


• Phi- 


(7.9) 


i — 2 ki~\-...-\-ki—j 
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Fig. 7.2. Complex functions of the proof of Theorem 7.5 (7 = 4 = 1) 

Observe that /3j is defined for all j > 1. We let 6 (C) = X^ >i PjC^ be its generating 
function and we obtain (by multiplying (7.9) with C J and summing over j > 1 ) 

6(C) = ^ + E j! HC7 = ^ + eKC) - 1 - HO, (7.10) 

where we have used the abbreviations 7 := yM/S and 4 := 2 kM/R. 

Whenever 7^ 2 (i.e., for C 7^ (26 — 1 )/ (7+4(26— 1)) with 6 = In 2 -\- 2 kiri) 
the implicit function theorem can be applied to (7.10). This implies that 6(C) is 
analytic in a disc with radius 1 /v = (2 In 2 — 1) /(7+ 4(2 In 2 — 1)) and centre at the 
origin. On the disc |C| < 1/zA the solution 6(C) of (7.10) with 6(0) = 0 is bounded 
by In 2. This is seen as follows (Fig. 7.2): with the function w = — 7C/(1 — q() 
the disc |C| < 1 /v is mapped into a disc which, for all possible choices of 7 > 0 
and q > 0, lies in \w\ < 2 In 2 — 1 . The image of this disc under the mapping 
b(w) defined by e b — 1 — 26 = w and 6(0) = 0 is completely contained in the disc 
|6| < In 2. Cauchy’s inequalities therefore imply \/3j\ < In 2 • z+, and we get 

\\fj II-R /2 = \\fj\\j < Sf3j < In 2 • 5 • v J . 

Since v = 4+ 7/(2 In 2 — 1) < iqMJ/R with 7^ given by r] = 2 max(ft, fi/{2 In 2 — 
1)) and 6 z/ < r]M, this proves the statement for J. □ 

IX.7.3 Choice of N and the Estimation of the Local Error 

To get rigorous estimates, we truncate the modified differential equation (1.1), and 
we consider 

y = F N (y), F N (y) = f(y) + hf 2 (y) + ...+h N ~ 1 f N (y) (7.11) 

with initial value y( 0) = yo. It is common in 
the theory of asymptotic expansions to truncate 
the series at the index where the corresponding 
term is minimal. Motivated by the bound (7.8) 
and by the fact that (ex) x admits a minimum 
for x = (£e) -1 (see the picture to the left with 
5 = 0.15), we suppose that the truncation index 
0 ( ee ) 1 e 1 N satisfies 
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TD 

hN < ho with ho = ——. (7.12) 

er\M 

Under the less restrictive assumption hN < eho , the estimates (7.2) and (7.8) imply 
for \\y — 2 / 0 11 < R/ 2 that 


llAv(y)|| < m(i + t) In2 (rjMjji 
V j = 2 V 

/ AT ✓ . v j — 

One can check that the sum in the lower formula of (7.13) is maximal for TV = 7 
and bounded by 2.38. For a pth order method we obtain under the same assumptions 

\\F N (y)-f(y)\\<cMhF, (7.14) 

where c depends only on the method. 

Theorem 7.6. Let f(y ) be analytic in B 2 r(d o), let the coefficients dj(y ) of the 
method (7.3) be analytic in Bn(yo), and assume that (7.2) and (7.5) hold. If h < 
ho/4: with ho = R/(er\M), then there exists N = N(h) (namely N equal to the 
largest integer satisfying hN < ho) such that the difference between the numerical 
solution yi = @h(yo) an d the exact solution f>N,t(y o) of the truncated modified 
equation (7.11) satisfies 

\\&h(yo) - <PN,h(yo)\\ < hjMe-bv/b, 

where 7 = e(2 + 1.657 + h) depends only on the method (we have 5 < 7 < 5.18 
and 7 < 31A for the methods of Table 7.1). 

The quotient L = M/R is an upper bound of the first derivative f'(y) and can 
be interpreted as a Lipschitz constant for f(y). The condition h < ho/4 is therefore 
equivalent to hL < Const , where Const depends only on the method. Because of 
this condition, Theorem 7.6 requires unreasonably small step sizes for the numerical 
solution of stiff differential equations. 

Proof of Theorem 7.6. We follow here the elegant proof of Benettin & Giorgilli 
(1994). It is based on the fact that ^h{yo) (as a convergent series (7.3)) and f>N,h{yo) 
(as the solution of an analytic differential equation) are both analytic functions of h. 
Hence, 

g(h) := $h(yo) - PN,h(yo) (7.15) 

is analytic in a complex neighbourhood of h = 0. By definition of the functions 
fj(y) of the modified equation (1.1), the coefficients of the Taylor series for ^h{yo) 
and f>N,h{yo ) are the same up to the h N term, but not further due to the truncation 
of the modified equation. Consequently, the function g(h) contains the factor h N+1 , 


3 -1 


< 


m(l + 1.65 77 ). 


(7.13) 
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and the maximum principle for analytic functions, applied to g(h)/h NJrl , implies 
that 

fh\ N+1 

UsWII < ( - ) max ||5(2)|| for 0 < h < e, (7.16) 

\S J \z\<£ 

if g(z) is analytic for w < 5 . We shall show that we can take 5 = eho/N , and we 
compute an upper bound for ||g(z)|| by estimating separately \\<Ph{yo) ~ Vo\\ and 

\\<PN,h(yo) - 2 / 0 II- 

The function $ z (yo) is given by the series (7.3) which, due to the bounds of 
Theorem 7.2, converges certainly for \z\ < R/ (4 kM), and therefore also for N <e 
(because 2k < rj and TV > 4, which is a consequence of ho/h > 4). Hence, it is 
analytic in M < 6. Moreover, we have from Theorem 7.2 that \\$ z (yo) ~ Vo\\ < 
\z\M(l + y) for |z| < 5 . 

Because of the bound (7.13) on Fjy(y), which is valid for y G B R / 2 (yo) and 
for \h\ < e, we have \\(pN,z(yo) — Vo\\ < \z\M(l + 1. 6577 ) as long as the solution 
&N,z(yo) stays in the ball B R / 2 {yo )• Because of eM(l + 1. 6577 ) < R/ 2, which is a 
consequence of the definition of 5 , of N > 4, and of (1 + 1.65^) < 1.8577 (because 
for consistent methods y > 1 holds and therefore also 77 > 2/(2 In 2 — 1 ) >5), this 
is the case for all N <e- In particular, the solution (pN,z{yo ) is analytic in M Re¬ 
inserting e = eho/N and the bound on \\g{z)\\ < \\$ z (yo)-yo\\ + \\<PN iZ (yo)- 
2 /o 11 into (7.16) yields (with C = 2 + 1.65 rj + y) 

fh\ N+1 

11^)11 < z MC [-) 

because hN < ho . The statement now follows from the fact that N < ho/h < 
N + 1, so that e~ N < e • e~ h °^ h . □ 

A different approach to a rigorous backward error analysis is developed by Moan 
(2005). There, the modified differential equation contains an exponentially small 
time-dependent perturbation, but its flow reproduces the numerical solution without 
error. 


N 


< hMC [ - ) = hMCl — ) < hMCe 

eh 0 


hN 


N 


-N 


IX.8 Long-Time Energy Conservation 

In particular, one easily explains in this way why symplectic algorithms 
give rise to a good energy conservation, with essentially no accumulation 
of errors in time. (G. Benettin & A. Giorgilli 1994) 

As a first application of Theorem 7.6 we study the long-time energy conservation of 
symplectic numerical schemes applied to Hamiltonian systems y = J~ 1 NH(y). It 
follows from Theorem 3.1 that the corresponding modified differential equation is 
also Hamiltonian. After truncation we thus get a modified Hamiltonian 

H{y) = H(y) + h?H p+l {y) + ... + ^^H^y), 


( 8 . 1 ) 
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which we assume to be defined on the same open set as the original Hamiltonian H ; 
see Theorem 3.2 and Sect. IX.4. We also assume that the numerical method satisfies 
the analyticity bounds (7.5), so that Theorem 7.6 can be applied. The following 
result is given by Benettin & Giorgilli (1994). 

Theorem 8.1. Consider a Hamiltonian system with analytic H : D —> M (where 
D C M? d ), and apply a symplectic numerical method d>h{y) with step size h. If 
the numerical solution stays in the compact set K C D, then there exist ho and 
N = N(h) (as in Theorem 7.6) such that 

H(y n ) = H(y 0 )+O(e~ h °/ 2h ) 

H(y n ) = H(y 0 ) + 0{h?) 

over exponentially long time intervals nh < e h °/ 2h . 

Proof We let PN,t{yo) be the flow of the truncated modified equation. Since this 
differential equation is Hamiltonian with H of (8.1), H((pN,t(yo)) — H(yo) holds 
for all times t. From Theorem 7.6 we know that \\y n +i ~TN,h{yn) || < hyMe~ h °/ h 
and, by using a global /^-independent Lipschitz constant for H (which exists by 
Theorem 7.5), we also get H(y n + 1 ) — H((pN,h{yn )) = 0(he~ h °/ h ). From the 
identity 


H(y n )-H(y 0 ) = £(#(%■) - iffo-i)) = £(#(%) “ #(fefo-i))) 

3 = 1 3= 1 

we thus get H(y n ) — H(yo). = 0(nhe~ h °/ h ), and the statement on the long-time 
conservation of H is an immediate consequence. The statement for the Hamiltonian 
H follows from ( 8 . 1 ), because H p +i(y) + hH p + 2 (y) + ... + h N ~ p ~ 1 H N (y ) is 
uniformly bounded on K independently of h and N. This follows from the proof of 
Lemma VI.2.7 and from the estimates of Theorem 7.5. □ 

Example 8.2. Let us check explicitly the assumptions of Theorem 8.1 for the pen¬ 
dulum problem q = p, p = — sin q. The vector field /(p, q) = (p, — sin q) T is also 
well-defined for complex p and q , and it is analytic everywhere on C 2 . We let K be 
a compact subset of { (p, q) £ M 2 ; \p\ < c}. As a consequence of | sin < e ^ mq \, 
we get the bound 

\\f(p,q)\\ < \/c 2 + 4 R 2 + e 2R 

for ||(p, q) — (po,go)|| < 277 and (po,^o) C K. If we choose c < 2, R m 1, 
and M = 4, the value fto of Theorem 7.6 is given by ho = 1 / 4 e ?7 ~ 0.018 for 
the methods of Table 7.1. For step sizes that are smaller than ho/ 20, Theorem 8.1 
guarantees that the numerical Hamiltonian is well conserved on intervals [0, T] with 
T « e 10 « 2 • 10 4 . 

The numerical experiment of Fig. 8.1 shows that the estimates for /i 0 are of¬ 
ten too pessimistic. We have drawn 200 000 steps of the numerical solution of the 
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Fig. 8.1. Numerical solutions of the implicit midpoint rule with large step sizes 


implicit midpoint rule for various step sizes h and for initial values (po,qo) = 
(0,-1.5), (po,qo) = (0,-2.5), (po,qo) = (1.5,-7r), and (po,qo) = (2.5, -n). 
They are compared to the contour lines of the truncated modified Hamiltonian 

~ p 2 h 2 ( \ 

H{p, q) = — - cos q + — (cos( 2 q) - 2 p 2 cos qj. 

This shows that for step sizes as large as h < 0.7 the Hamiltonian H is extremely 
well conserved. Beyond this value, the dynamics of the numerical method soon 
turns into chaotic behaviour (see also Yoshida (1993) and Hairer, Nprsett & Wanner 
(1993), page 336). 

Theorem 8.1 explains the near conservation of the Hamiltonian with the sym- 
plectic Euler method, the implicit midpoint rule and the Stormer-Verlet method as 
observed in the numerical experiments of Chap. I: in Fig. 1.1.4 for the pendulum 
problem, in Fig. 1.2.3 for the Kepler problem, and in Fig. 1.4.1 for the frozen argon 
crystal. 

The linear drift of the numerical Hamiltonian for non-symplectic methods can 
be explained by a computation similar to that of the proof of Theorem 8.1. From a 
Fipschitz condition of the Hamiltonian and from the standard local error estimate, 
we obtain H(y n+1 ) - H(<p h (y n )) = 0(h p+1 ). Since H(p h (y n )) = H(y n ), a 
summation of these terms leads to 

H{y n ) - H{y 0 ) = 0(thP) for t = nh. (8.2) 

This explains the linear growth in the error of the Hamiltonian observed in Fig. 1.2.3 
and in Fig. 1.4.1 for the explicit Euler method. 
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IX.9 Modified Equation in Terms of Trees 

By Theorem III. 1.4 the numerical solution y\ = $h(y o) of a Runge-Kutta method 
can be written as a B-series 

®h{y) = V + hf(y) + h 2 a(f)(ff)(y) 

+ h 3 (^a(Y)f"(f, f)(y)+ a(})/'/7(y)) + ••• • 

For consistent methods, i.e., methods of order at least 1, we always have a( •) = 1, 
so that the coefficient of h is equal to f(y). In this section we exploit this special 
structure of $h(y) in order to get practical formulas for the coefficient functions of 
the modified differential equation. Using (9.1) instead of (1.3), the equations (1.4) 
yield 

f2(y) = («(/)- l)(f'f)(y) 

h(v) = + ( 9 - 2 ) 

+ («(})-«(/) + l)fff(y)- 

Continuing this computation, one is quickly convinced of the general formula 

fj(p) = ^ ^j\ F ( T Ky), (9- 3 ) 

ixw a(T) 

so that the modified equation (1.1) becomes 

y = X ~TT bF ^ @) (9 - 4) 

a{ - T > 

with b( •) = 1, b(f) = a(f) — etc. Since the coefficients cr(r) are known from 
Definition III. 1.7, all we have to do is to find suitable recursion formulas for the real 
coefficients b(r). 

IX.9.1 B-Series of the Modified Equation 

Recurrence formulas for the coefficients b(r) in (9.4) were first given by Hairer 
(1994) and by Calvo, Murua & Sanz-Sema (1994). We follow here the approach 
of Hairer (1999), which uses the Lie-derivative of B-series and thus simplifies the 
construction of the coefficients. 

We make use of the notion of ordered trees introduced in Sect. III. 1.3. For a 
given tree r we define the set of all splittings as 

SP(t) = {6 G OST(t ) ; r\0 consists of only one element}. (9.5) 

Here, OST(r) = OST(u(t )) is the set of ordered subtrees as defined in (ID. 1.33). 
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Lemma 9.1 (Lie-Derivative of B-series). Let b(r) (with 6(0) = 0) and c{r) be 
the coefficients of two B-series, and let y(t) be a formal solution of the differen¬ 
tial equation hy(t) = B(b,y(t )) . The Lie derivative of the function B(c,y) with 
respect to the vector field B(b,y) is again a B-series 

= B(d b c,y(t)). 

Its coefficients are given by <9frc(0) = 0 and for \r\ > 1 by 

dbc(r) = c(6)b(r\6). (9.6) 

eesp(r) 



Fig. 9.1. Splitting of an ordered tree u into a subtree 6 and {(5} = u \ 0 


Proof For the proof of this lemma it is convenient to work with ordered trees uj G 
OT. Since v(t) of (III. 1.31) denotes the number of possible orderings of a tree 
r G T, a sum •/* becomes ' /' • 

For the computation of the Lie derivative of B(c,y) we have to differentiate 
the elementary differential F(6)(y(t )) with respect to t. Using Leibniz’ rule, this 
yields \6\ terms, one for every vertex of 6. Then we insert the series B(b , y(t)) for 
hyft). This means that all the trees S appearing in B(b,y(t)) are attached with a 
new branch to the distinguished vertex. Written out as formulas, this gives 


h ^t B ( c , y (t)) 


X 

oeOTu{0)} 


h\ 0 \ c(Q) 
u(0)a(0) 


XX 

7 SeOT 


ftl g l b(5) 
v(5)a(5) 


F(0 o 7 S)(y(t)), 


where ^ is a sum over all vertices of 0, and 0 o>. 6 is the ordered tree obtained 
when attaching the root of 5 with a new branch to 7 (see Fig. 9.1). We choose one of 
the 77,(7) + 1 possibilities of doing this, where 77,(7) denotes the number of upwards 
leaving branches of 6 at the vertex 7. We now collect the terms with equal ordered 
tree u = 6 o 7 5, and notice that u(Q)cr(0) = n(6) with n(0) given by (III. 1.32). This 
gives 




c oClOT 9o^8=ou 


cjms) 

(77,(7) + 1) k (<5) 




where s =( jJ * s over triplets (#> 7,5) such that 0o 1 5 = u. Because of k{u) = 
n(0)n(S)(n(y) + 1), we obtain 
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Y T Y Y 4* 

Fig. 9.2. Illustration of the formula (9.6) for an ordered tree with 5 vertices 

h j t B (c,y{t)) = X X c ( 0 )W))F(u)(y(t)) 

11 ujeOT ' ' 0o 1 8=u 

c(6>) b{r \ Q)^JF(t) (y(t )), 

rGT ' ' 9eSP(r) 

which proves the statement. □ 

Let us illustrate this proof and the formula (9.6) with an ordered tree hav¬ 
ing 5 vertices. All possible splittings u = 9 o 7 5 are given in Fig. 9.2. Notice 
that 6 may be the empty tree 0, and that always \S\ > 1. We see that the tree 
u is obtained in several ways: (i) differentiation of F($)(y) = y and adding 
F(u)(y) as argument, (ii) differentiation of the factor corresponding to the root 
in F(0)(y ) = f"(f,f)(y) and adding F(/)(y ) = (f'f)(y), (iii) differentiation 
of all /’s in F(0)(y) = f'"(fJJ)(y) and adding F(*)(y) = f(y ), and finally, 
(iv) differentiation of the factor for the root in F(6)(y) = /"(/'/, f)(y) and adding 
F( •)(y) = f(y ). This proves that 

d b c(^Y) = c(0)6(<^)+c(V)&(/)+c(V)K*)+2c(^)6(.). 

For the trees up to order 3 the formulas for d^c are: 


d b c(.) 

= c(0) b (•) 

d b c(/) 

= C (0)K/) + c (*) & (*) 

d b c{Y) 

• 

= c(0)6(V)+2c(/)6(.) 

• 

d b c(f) 

= c(0)6(/) + c(.)K/) + c(/)6(.)- 


The above lemma permits us to get recursion formulas for the coefficients b(r) of 
the modified differential equation (9.4). 

Theorem 9.2. If the method d>h(y) is given by (9.1), the functions fj (y) of the mod¬ 
ified differential equation (1.1) satisfy (9.3), where the real coefficients b(r) are re¬ 
cursively defined by 6(0) = 0, b( •) = 1 and 

|r| 1 

b{j) = a(r) - X d b 1 b(r). (9.7) 

3=2 J ' 

Here, is the (j — 1 )-th iterate of the Lie-derivative db defined in Lemma 9.1. 
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Proof. The right-hand side of the modified equation (9.4) is the B-series B(b,y(t )) 
divided by h. It therefore follows from an iterative application of Lemma 9.1 that 

h 3 y {o \t) = B(dp 1 b,y(t)), 

so that by Taylor series expansion y(t + h) = y + B(^2 J>1 jj dp l b,y) , where 
y := y(t). Since we have to determine the coefficients b(r) in such a way 
that y(t + h) = @h{y) = B(a,y), a comparison of the two B-series gives 
J2j >i j\ db~ L H r ) = a ( r )- This proves the statement, because d®b(r) = b(r) 

for rGT, and d^~ 1 b(r) = 0 for j > |r| (as a consequence of 6(0) = 0). □ 

We present in Table 9.1 the formula (9.7) for trees up to order 3. 

Table 9.1. Examples of formula (9.7) 


T — * 

6 (.) = a (*) 

T = / 

Kl)=a{!)-\b{.f 

r=V 

6 ( V ) = «( V )- b (/) 6(-)-|&(*) 3 

T = } 

KI) = «(/) -KPH’) ~ i&(*) 3 


We next consider the case when a symplectic method is applied to a Hamiltonian 
system y = J -1 VH{y). It follows from Theorem 3.1 that the modified equation is 
again Hamiltonian. What does this imply for the coefficients of (9.4)? 

Theorem 9.3. Suppose that for all Hamiltonians H(y) the modified vector field 
(9.4), truncated after an arbitrary power ofh, is (locally) Hamiltonian. Then, 

b(u o v) + b(y o u) = 0 for all u,veT. (9.8) 

Proof. Let <pN,t{yo) be the flow of the modified differential equation (9.4), trun¬ 
cated after the h N ~ x terms. It is symplectic for all t, and in particular for t = h. 
As a consequence of the proof of Theorem 9.2 we obtain that f>N,h{y o) is a sym¬ 
plectic B-series B(cln , yo). The coefficients a a t(t) are given by (9.7), where b(r) 
is replaced with 0 for |r| > N. For u,v G T with \u\ + \v\ = N we therefore have 

b(u o v) = cln(u o v) — ajsr-i(u o v). 

Since ajsf(r) = aAr_i(r) for |r| < N, formula (9.8) is an immediate consequence 
of Theorem VI.7.6. □ 

Remark 9.4. Let G = {a : T — ► R | a(0) = 1} be the Butcher group (see 
Sect. III. 1.5), and consider the mapping S : G —> M defined by 


S(a ) = a(u o^) + a(y o u) — a(u) • a(y). 
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If we denote by e e G the element corresponding to the identity (i.e., e(0) = 1 and 
e(r) = 0 for |r| >1), we have for its derivative 

S'(e)b = b(u o v) + b(v o u ). 

Hence, coefficient mappings b(r) satisfying (9.8) lie in the tangent space at e(r) of 
the symplectic subgroup of G (i.e., a G G satisfying (VI.7.4)). This is in complete 
analogy to the fact that Hamiltonian vector fields can be considered as elements of 
the tangent space at the identity of the group of symplectic diffeomorphisms (see 
also Exercises 15 and 16). 

IX.9.2 Elementary Hamiltonians 

If the modified differential equation (9.4) is Hamiltonian, can we find explicit for¬ 
mulas for H (; y)l Let us start with an easy example, the implicit midpoint rule. Writ¬ 
ten as a B-series (9.1), its coefficients are a(r) = 2 1- l r l (cf. Exercise 8) so that the 
first coefficient functions (9.2) of the modified equation satisfy /^(y) = 0 and 

My) = ^( 2 (/77)(y) - /"(/, /)(</)) • (9-9) 

Since f(y ) = J -1 VH(y), differentiation of 

H 3 (y) = -T H"{y)(j-'VHiy), (9.10) 

shows that fs(y) = J~ 1 'VHs(y), and we have found an explicit expression of the 
Hamiltonian corresponding to the vector field fs(y). It is recommended to compute 
also fs(y) and to try to find H$(y) such that fs(y) = J~ 1 \/H^ > (y). Such com¬ 
putations lead to expressions that have been introduced in a different context by 
Sanz-Serna & Abia (1991). They call them canonical elementary differentials. 

Definition 9.5 (Elementary Hamiltonians). For a given smooth function H : 
D —> M (with open D C M 2d ) and for r G T we define the elementary Hamil¬ 
tonian H(t) : D —> M by 

H(.)(y) = H(y), H(r)(y) = H^ m \y)(F(T 1 )(y),..., F(r m )(y)) (9.11) 

for r = [ti, ..., T m \. Here, F(ri)(y) are elementary differentials corresponding to 

f(y) = J~^H{y). 

The expression in (9.10) is nothing else than the elementary Hamiltonian corre¬ 
sponding to the tree . Our aim is to prove that, for symplectic methods applied 
to Hamiltonian systems, the coefficient functions (9.3) of the modified differential 
equation satisfy fj(y) = J~ 1 'VHj(y ), where Hj(y) is a linear combination of ele¬ 
mentary Hamiltonians. 
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Lemma 9.6. Elementary Hamiltonians satisfy 

H(u o v)(y) + H(v o u)(y) = 0 forallu,vET. (9.12) 

In particular, we have H{u o u) (■ y ) = 0 for all u G T. 

Proof This follows immediately from the fact that for u = [u ^,... ,u m \ G T 
and for v G T we have H(u o v) = 77 r ( m + 1 ) (F(ui ),..., F(u m ), F(v)) = 
F(v) t (\7H )( m ) (F(t/i),..., F(u m )) = F(v) T J F(u ), and from the skew-sym¬ 
metry of J. □ 

The trees u o v and v o u have the same graph and differ only in the position 
of the root. The relation (9.12) thus motivates the consideration of the (smallest) 
equivalence relation on T satisfying 

u o v ~ v o u. (9.13) 

We want to select from each equivalence class, not containing a tree of the form 
uou, exactly one element. This can be done as follows (cf. Chartier, Faou & Murua 
2005): we choose a total ordering on the set T that respects the number of vertices, 
i.e., u < v whenever \u\ < \v\, and we define 

T* = { • } U {t | r cannot be written as r = u o v with u < v} (9.14) 

= {• V V W Q § </...} 

(for the second line we assume [ •, • ] < [[•]]). Every tree r G T is either equivalent 
to some u o u or to a tree in T* . This is a consequence of the fact that as long as 
r = uov with u < vfit can be changed to v o u (what happens only a finite number 
of times). Moreover, two trees of T* can never be equivalent. 

Lemma 9.7. For a tree r G T* we have 

J- l VH{r){y) = <t(t) £ F(9)(y), (9.15) 

where 0) is the number of root changes that are necessary to obtain 6 from r. 

Proof We compute J~ 1 VH(r) (y). The expression H(r) (y) consists of |r | factors 
corresponding to the vertices of r, each of which has to be differentiated by Leibniz’ 
rule. Differentiation of H^ m \y) (cf. Definition 9.5) and pre-multiplication by the 
matrix J -1 yields F(r)(y). Before differentiating the other factors, we bring the 
corresponding vertex down to the root. In view of Lemma 9.6 this only multiplies 
H(r)(y) by (—and shows that a differentiation of the corresponding factor 
yields F(6)(y). Since r G T*, the number of possibilities to obtain 6 from r by 
exchanging roots is equal to a(r)/a(0). This factor has to be included. □ 
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IX.9.3 Modified Hamiltonian 

We are now in the position to give an explicit formula for the Hamiltonian of the 
modified differential equation provided that the numerical method can be written as 
a B-series. An extension to partitioned methods will be given in Sect. IX. 10. 

Theorem 9.8. Consider a numerical method that can be written as a B-series (9.1), 
and that is symplectic for every Hamiltonian system y = J~ 1 \7H(y). Its modified 
differential equation is then Hamiltonian with 

H(y) = H\(y) + hH 2 (y) + h 2 H 3 (y ) + ..., 


where 

Hj(v) = X ~f\ H(r)(y), (9.16) 

and the coefficients b(r) are those of Theorem 9.2. Notice that the sum in (9.16) is 
only over trees in T* as defined in (9.14). 

Proof. We apply the method (9.1) to the Hamiltonian system, so that by Theo¬ 
rem 3.1 the modified differential equation is (locally) Hamiltonian. It therefore fol¬ 
lows from Theorem 9.3 that the coefficients b(r) of (9.4) satisfy (9.8). This relation 
implies b(0) = (—1 )^ T ^b(r) whenever 6 ~ r. Inserted into (9.3), an application 
of Lemma 9.7 proves the statement. □ 

Remark 9.9. This theorem gives an explicit formula for the modified Hamiltonian 
(for methods expressed as B-series). Since the elementary Hamiltonians H(r)(y) 
depend only on derivatives of H(y ), this modified Hamiltonian is globally defined. 
For Runge-Kutta methods this provides an alternative approach to the statement of 
Theorem 3.2. 

For the sake of completeness we give in the following theorem a characterization 
of Hamiltonian vector fields of the form (9.4). 

Theorem 9.10. The differential equation hy = B(b,y) with 6(0) = 0 is Hamil¬ 
tonian for all vector fields f(y) = J~ 1 'VH(y), if and only if 

b(u o v) + b(v o u) = 0 for all u,v£T. (9.17) 

Proof. The “only if” part follows from Theorem 9.3. The “if” part is a consequence 
of the proof of Theorem 9.8. □ 

IX.9.4 First Integrals Close to the Hamiltonian 

We have seen in Sect. IX.9.3 that for symplectic methods the modified differential 
equation (9.4) based on f(y) = J~ 1 'VH(y) is Hamiltonian with a function of the 
form 
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H{c,y)= X - 1 x c ( r ) H(r)(y) (9.18) 

47 - <t(t) 

and coefficients c(r) = b(r). In this section we study whether for non-symplectic 
methods a function of the form (9.18) can be a first integral of (9.4). This question 
has been addressed by Faou, Hairer & Pham (2004), and we closely follow their 
presentation. 

Lemma 9.11. Let y(t) be a solution of the differential equation (9.4) which can be 
written as hy(t ) = B(b , y(t)). We then have 

f t H (c,y(t)) = H(S b c,y(t )) 

where S c b( •) = 0 and, for r G T* with \r\ > 1, 

S b c(r) = y>l^ £ cM b(0 \ u). (9.19) 

e~r ' coeT*nsP(9) 

The first sum is over all trees 6 that are equivalent to r (see (9.13)), and the second 
sum is over all splittings of 0 as in Lemma 9.1 (see Table 9.2). 

Proof. The proof is nearly the same as that of Lemma 9.1. The first sum in (9.19) 
appears, because H(0)(y) = H(r)(y) for 6 ~ r and because the sum in (9.18) is 
only over trees in T*. □ 

Table 9.2. Formulas for £&c(r) for trees r E T* up to order 6 

MV) = —2 c( •)&(/) 

MV) = 3c(V)6(.)-3c(.)6(V) 

<V-:(*V / *) = 4cCV) 6(») — 4c( •)&(%$/) 


mY) 

= c(V>(*) 

; 

MVW/)MmY)-MMV) 

M<>) 

= 2c (*) fo (}) 

-2c(V)b(/) 

M^*) 

= 5 c(*^P)b( 

• ) - 5c(«)6(^*) 

5 b cfip) 

= 3 cfy)b(. 

•) + c(^)6(.) + c(V>(/) 


—3 c( • 

)6(<^)+c(.)6(Y) 

6 b c(ty) 

= 2c (^P) & (* 

') +c(^)6(.) — c( • + 2 c( • )6( J) 

mV) 

= 2c(VW* 

) - c(Y) 6( • ) - c(‘V)6(/) - c( VWV) 

; V 


~ c CV)b(j) + 2c(*)K V) + C (*)K/) 
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Corollary 9.12. The function iT(c, y) of (9.18) is a first integral of the differential 
equation (9.4) for every H(y ) if and only if 

S bC (r) = 0 for all r G T*. (9.20) 

Proof. The sufficiency follows from Lemma 9.11 and the necessity is a consequence 
of the independence of the elementary Hamiltonians. To prove their independence 
we have to show that the series (9.18) vanishes for all smooth H ( y ) only if c(r) = 0 
for all r G T*. With the techniques of the proof of Theorem VI.7.4 one can show 
that for every tree r G T* there exists a polynomial Hamiltonian such that the first 
component of F(r)( 0) vanishes for all trees except for r. Differentiating (9.18) and 
employing Lemma 9.7 proves that c(r) = 0. □ 

Solving the System (9.20). We consider a consistent method, i.e., b( •) = 1, and 
we search for a first integral H(c, y ) close to the Hamiltonian, i.e., c( •) = 1. 

\t\ = 3: The condition (9.20) for r = \f implies b(f) =0, which means that 
the method has to be of order two. 

\t\ = 4: There is only one tree in T* with four vertices. The corresponding 
condition can be satisfied by putting c(\f) = b( V)* 

\t\ = 5: The third condition yields &([[[•]]]) = 0. Letting c(NJ/) be such that 
one of the other two conditions holds, we still have to satisfy 

KV) + &(Y)£2&c})#0. (9.21) 


This condition is satisfied for symplectic methods, for which b(u o v) + b(v o u) =0, 
and also for symmetric methods, for which b(r) = 0 for trees with an even order. 

\t\ = 6: There are four conditions for three c(r) coefficients. Assuming (9.20) 
for trees with less than five vertices, these four conditions admit a solution if and 


only if 

5 6 (^) + 5 



-15 6(<K)-3 6(V)(6(V) +&(})) 


This relation is obviously satisfied by every symplectic method. However, as we 
shall see soon, there are symmetric methods that do not satisfy (9.22). 

For various symmetric methods of order 4 (i.e., b(r) = 0 for 1 < |r| < 5) we 
compute the coefficients b(r) of the leading perturbation term in (9.4) and also the 
expression (9.22), see Table 9.3. None of the considered methods is symplectic. 

Surprisingly, the 3-stage collocation method Lobatto IIIA (see Table II. 1.2 for 
the coefficients) satisfies the condition (9.22). This implies for every Hamiltonian 
system (reversible or not reversible) that the dominating error term in the numerical 
Hamiltonian does not have any drift. 

The 3-stage Lobatto IIIB method (see Table II. 1.4) does not satisfy the condition 
(9.22). We therefore expect a drift in the numerical Hamiltonian. 
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Table 9.3. Coefficients 6(r) and expression (9.22) for methods of order 4 


method 

W 


0 

T 

t 


) 

(9.22) 

Lobatto IIIA 

1 

1 

1 

1 

1 

1 

1 

0 

120 

240 

480 

_ 120 

_ 240 

720 

_ 360 

Lobatto IIIB 

1 

1 

1 

1 

1 

1 

1 

1 

120 

~ 360 

_ 720 

_ 120 

360 

720 

240 

48 


Lemma 9.13. For given 6(t),t G T satisfying 6(0) = 0, 6( •) = 1, and for fixed 
c( •), /7z£ linear system (9.20) for c(r), r G T* has at most one solution. 

Proof. We prove by induction on r G T* that c(r) is uniquely determined by (9.20). 
For this we assume that the ordering on T is such that, within trees of the same 
order, it is increasing when the numer of vertices connected to the root decreases, 
cf. (9.14). 

Let r = [n,...,T m , G T* \ {•} with \tj\ > 1, and denote by k 

the number of • ’s in this representation. Since the tree r o • is again in the set T*, 
condition (9.20) yields 

0 = 5i)C(r o • ) = (& + l)c(r)6( •) — (& + l)c( • )6(r) + ... . (9.23) 

For m = 0, no further terms are present and c(r) is uniquely determined by this 
relation. For m > 0, the three dots in (9.23) represent a linear combination of 
c(/jL)b(v) with \/jl\ < \t\ (which, by the induction hypothesis, are already known) 
and of c(a)b( •), where cr G T* is the representant in T* of the equivalence class 
for t'. We use the notation r' for some tree which is obtained from r by removing 
one of the end vertices of Tj and by adding it to the root of r. 

In general we will have r f G T* (so that cr = r'), and in this case its number of 
end vertices connected to the root is larger than that for r. Hence, cr < r, and the 
coefficient c(cr) is known by the induction hypothesis. 

If r' £ T*, what is only possible if r = u o v with \u\ = |u| and u > v, we 
have r' = u' o v and u' < v (notice that u' = v is not permitted for trees in T*). 
In this case we have cr = v o u' G T*. Consequently, c(r) = c(u o v) is expressed 
in terms of c(v o u') and known quantities. Applying the same reasoning to v o u' 
and observing that because of u > v the tree v has at least as many end vertices 
connected to the root as the tree u, we see that c(v o u') is expressed in terms of 
already determined quantities. □ 

The expression (9.20) is bilinear in b and c. Assuming that hy = B(b,y ) is 
Hamiltonian, the mapping b has the same degree of freedom as c. It is therefore not 
astonishing to have the following dual variant of Lemma 9.13. 

Lemma 9.14. Let c(r), r G T* he given and assume c( •) = 1 and 6(0) = 0. Then, 
for fixed 6( •), the linear system (9.20) for 6(r), r G T has at most one solution 
satisfying b(u o v) + b(v o u) = 0 for all u,v G T. 
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Proof. By assumption on b , the coefficients b(r), r G T \ T* are uniquely deter¬ 
mined by those for r G T*. The statement is thus obtained in the same way as 
that for Lemma 9.13 with the only difference that expressions c( •)b(a) and not 
c(a)b( •) have to be studied. □ 

Theorem 9.15 (Chartier, Faou & Murua 2005). The only symplectic method (as 
B-series) that conserves the Hamiltonian for arbitrary H(y ) is the exact flow of the 
differential equation. 

Proof. If the method conserves exactly the Hamiltonian, we have (9.20) with 
c( •) = 1 and c(t) = 0 for all other trees in T*. By the uniqueness statement 
of Lemma 9.14 and the symplecticity of the method (Theorem 9.10), we obtain 
b(r) = 0 for |r| > 1. Consequently, no perturbation is permitted in the modified 
differential equation of the method. □ 

A closely related result is given in Ge & Marsden (1988). There, general sym¬ 
plectic methods are considered (not necessarily B-series methods) but a weaker re¬ 
sult is obtained (in fact, they assume that the system does not have other conserved 
quantities than H(y), and it is shown that the numerical flow coincides with the 
exact flow up to a reparametrization of time). 

IX.9.5 Energy Conservation: Examples and Counter-Examples 

It is generally believed that symmetric methods applied to reversible Hamiltonian 
systems (reversible in the sense that H(—p,q ) = H(p,q)) have the same long¬ 
time behaviour as symplectic methods. This is true in many situations of practical 
interest, and we shall prove this rigorously in Sect. XI.3 for integrable reversible 
systems. There are, however, interesting counter-examples to this general belief. 
They are taken from Faou, Hairer & Pham (2004). 

Example 9.16. Our first example is a modification of the pendulum equation 

H(p, q) = \ P 2 - cos <7 + | Sin(2g), (9.24) 

where the additional term sin(2g) destroys the symmetry in q. The Hamiltonian still 
satisfies H(—p , q) = H(p, q). We consider initial values p(0) = 2.5, q(0j = 0 with 
sufficiently large initial velocity, such that p(t) stays positive for all times and the 
symmetry p <-+ —p does not affect the numerical solution. The angle q(t) increases 
without limit, but the potential is 27T-periodic so that the solution stays on a closed 
curve of the cylinder M x S' 1 . 

We apply the 3-stage Lobatto IIIA and IIIB methods to this problem. Figure 9.3 
shows the error in the Hamiltonian along the numerical solutions. There is a visible 
energy drift of size 0(th 4 ) for the Lobatto IIIB method and no drift can be seen on 
this scale for the Lobatto IIIA method. To get more insight into its long-time behav¬ 
iour, we apply the method with the same step size to a much longer time interval, 
and we plot the error in H(p n , q n ) + h 4 H$ (p n , q n ), where the first perturbation term 
is computed from (9.18) and the linear system (9.20) as 
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Fig. 9.3. Numerical Hamiltonian of Lobatto methods of order 4 for the perturbed pendulum 
(9.24); step size h — 0.2, integration interval [0, 500] 



Fig. 9.4. Error in H(p,q) + h 4 H 5 (p, q) along the numerical solution of the 3-stage Lo¬ 
batto III A method for the perturbed pendulum (9.24); step size h = 0.2, integration interval 
[0,500 000] 


H 5 (p,q) = ^{3U^(q)p 4 -2U^U\q)p 2 - (U"(q)p) 2 + U"(q)(U\q)) 2 ) 

with the potential U(q) = — cos q + 0.2 sin(2g) (see Fig. 9.4). Repeating the same 
experiment with halved step size shows that there are oscillations with amplitude 
0(h 6 ) and a drift with slope 0(h 8 ). Consequently, the error in the Hamiltonian for 
the Lobatto IIIA method behaves on this problem like 0(h‘ 4 + th 8 ). 

Without the term sin(2g) in (9.24) all symmetric one-step methods nearly con¬ 
serve the Hamiltonian. 

Example 9.17. For polynomial Hamiltonians H(y) of degree at most four, the el¬ 
ementary Hamiltonian corresponding to the tree vanishes identically. There¬ 
fore, the condition (9.20) need not be considered for this tree, and the remaining 
three conditions can always be satisfied by the three c(r) coefficients. This implies 
that, for example for the Henon-Heiles problem 

H(pi,P2,qi,q2) = \(pI+pI) + \{ql + ql) + <?i <& - ^ A (9.25) 

the leading error term in the numerical Hamiltonian remains bounded by all methods 
of order four. Numerical experiments indicate that in this case also higher order error 
terms are bounded by symmetric methods such as Lobatto IIIA and IIIB, even if the 
initial values are chosen so that the solution is chaotic. 

Example 9.18. A concrete mechanical system with two degrees of freedom is de¬ 
scribed by the Hamiltonian 
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H(p,q) = \ p T P+\{M ~ l) 2 +92 - i| g ^ a || - (9-26) 

It is a model of a planar spring pendulum with exterior forces. The spring has a 
harmonic potential with frequency co (Hooke’s law). The exterior forces are gravi¬ 
tation and attraction to a mass point situated at a, which has to be chosen so that no 
symmetry in the q- variables is present. 

The numerical experiments, reported by Faou, Hairer & Pham (2004), use 
u = 2, a = (—3, — 5) t , and initial values for the position g(0) = (0, 1) T (up¬ 
right position), and for the velocity p(0) = (—1, — 0.5) T . The pendulum thus turns 
around the fixed end of the spring which is at the origin. 

As for the problem of Example 9.16 one clearly observes a drift for the 3-stage 
Lobatto IIIB method, and the error in the Hamiltonian behaves like 0(th A ). As 
predicted by the theory of the preceding section, the dominant error term for the 3- 
stage Lobatto IIIA method is bounded. There is, however, a drift already in the next 
term so that the error in the Hamiltonian behaves for this method as 0(h A + th 6 ). 

Removing one of the exterior forces (gravitation or attraction to a), the error 
in the Hamiltonian remains bounded of size 0(h A ) without any drift (even not in 
higher order terms) for both Lobatto methods. 


IX. 10 Extension to Partitioned Systems 

All results of Sect. IX.9 can be extended to partitioned methods whose discrete flow 
can be written as a P-series. This includes important geometric integrators such as 
the symplectic Euler method and the Stormer-Verlet scheme. Interestingly, many of 
the results have been originally presented and proved for this more general case (see 
Hairer (1994)). 


IX.10.1 P-Series of the Modified Equation 


We consider the partitioned system 


p = f(p,q), q = 9(p,q), 


(io.i) 


where, in view of an application to Hamiltonian systems, we use (p, q) instead of 
(y, z) for the variables. By Theorem III.2.4 all consistent partitioned Runge-Kutta 
methods can be written as P-series (cf. Definition III.2.1) 


a (/)(fpf) 

a (J)(9 P f) 


a (f)(fqd) 

a(^)(g q g) 


+ .. 


( 10 . 2 ) 


where the subscript 0 indicates an evaluation at the initial value (po, qo)- The first 
perturbation term of the modified equation (1.1) can therefore be written as 
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(h(p,q)\ = /(«(/)- §)(/p/)(p>?) + («(/)- §)(/«s)(p, q )\ 
\92(p,q)J V («(/) - l)(9 P f)(p,q) + (a(/) - IK&flOCp,g)/ 

and, in general, one finds 

Hence, the modified equation (1.1) is of the form 

_ /E T £TP p h ^Y h (j)F(j){p 1 q) 

UJ " {ZrtTP'^WFW&q) 

where b(r) = 1 for |r| = 1, b(r) = a(r) — \ for |r| = 2. For |r| > 2, the 
coefficients b(r) can be obtained recursively from Theorem 10.2 below. The proofs 
of the following two results are straightforward extensions of those for Lemma 9.1 
and Theorem 9.2, and are therefore omitted. 

Lemma 10.1 (Lie-Derivative of P-series). Let b(r) (with b(0 p ) = b(0 9 ) = 0) and 
c(r) be the coefficients of two P-series, and let (p(t), q(t )) be a formal solution of 
the differential equation h(p(t ), q(t)) T = P(b,(p(t),q(t))), i.e., (10.4). The Lie 
derivative of the function P(c, (p, g)) with respect to the vector field P(b, (p, g)) w 
agam a P-series 

h f t P{c,{p{t),q(t))) = P(dbC,(p(t),q(t))). 

Its coefficients are given by <%c(0 p ) = dbc(fy q ) = 0, and for \r\ > 1 by 

dbc{r) = ^ c(6) b(r \ 9), (10.5) 

eesp(r) 

where, analogously to (9.5), SP(r) denotes the set of splittings ofr G TP. □ 

In formula (10.5), 0 P G SP(r) defines a splitting only if r G TP p , and 
0^ G SP(r) only if r G TP q . We therefore have c^c( •) = c(0 p )6( •), c^c(o) = 
c($ q )b( o), and as examples for trees of order 3 

dbc{ V) = c(0 p )6(Y) + 2c(/)6(o), 

3 b c(Y) = c(0 p )6(Y)+ C (/)6(o) + c(/)&(.)- 

Theorem 10.2. If the method (pi,gi) = ^(po, go) written as (10.2), the 

modified differential equation is given by (10.4), where the real coefficients b(r) are 
recursively defined by 6(0 p ) = b(0 g ) = 0, b(r) = l/or |r| = 1, and 

|r| 1 

b( T ) = a(r) 1 b(r) for r G TP. (10.6) 

j=2 J ' 

//ere, denotes the iterate of the Lie derivative defined in Lemma 10.1. □ 
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Example 10.3. The symplectic Euler method 

Pn+1 = Pn T~ hf (Pn-\-l •> Qn) 5 Qn+ 1 = Qn "t“ ^'(.Pn+l 5 Q_n) (10-7) 

is a partitioned Runge-Kutta method (an = 1, an = 0, b\ =61 = 1) and can 
therefore be expressed as a P-series (10.2). From Theorem III.2.4 we get its coeffi¬ 
cients: 


f 1 if all vertices (different from the root) are black, 
a(r) = < 

( 0 otherwise. 

From Theorem 10.2 we can compute the coefficients b(r) of the modified equation 
(10.4). They are given in Table 10.1 for the trees with a black root. Since a(r) does 
not depend on the colour of the root of r, the same holds for the coefficients 6 (r). 
Hence, we do not include the values of b(r) for trees with a white root. 


Table 10.1. Coefficients b(r) of the modified equation for symplectic Euler (10.7) 


T 

• I 

/ 

V 

Y 

V } } } } 

6(r) 

1 1/2 

-1/2 

1/6 

-1/3 

1/6 1/3 -1/6 -1/6 1/3 


We know from Theorem 3.1 that the modified differential equation (10.4) of a 
symplectic method applied to a Hamiltonian system 

P — —H q (p, q), q = H p (p,q ) (10.8) 

is again Hamiltonian. 

Theorem 10.4. Suppose that for all separable Hamiltonians H(p,q) = T(p) -f 
U(q) the modified vector field (10.4), truncated after an arbitrary power of h, is 
(locally) Hamiltonian. Then, we have 

b(u o v) + b(v o u) 0 u £ TP p , v £ TP q (10.9) 

for trees, where neighbouring vertices have different colours. 

If it is (locally) Hamiltonian for all H(p , q), then (10.9) holds for all u £ TP p , 
v £ TP q , and additionally we have 

b(r) is independent of the colour of the root ofr £ TP. (10.10) 

If it is (locally) Hamiltonian for all H(p,q) = \p T Cp + c T p + U(q) (with 
symmetric matrix C), then we have 

b( o o u) + b(u o o) =0, b(u oov) — b(v oou) = 0 it, v £ TN p (10.11) 
(see Sect. VI.7.1 for the definition of TN p and u oov). 

The proof is the same as for Theorem 9.3 and therefore omitted. □ 
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IX.10.2 Elementary Hamiltonians 

We have already seen in Example 3.4 that the modified Hamiltonian of the sym- 
plectic Euler method is composed of expressions such as H p H q , H pp (H q , H q ), 
H pq (H q , H p ), etc. These will play the role of elementary Hamiltonians for parti¬ 
tioned methods. In the following definition, the elementary differentials F(r)(p, q ) 
correspond to the partitioned system /(p, q) = — H q (p , q), g(p , q) = H p (p , q). 

Definition 10.5. For a given function H : D —> M (with open DcR^x M d ) and 
for r G TP we define the elementary Hamiltonian H(r) : D —> M by 

H(.)(p,q) = H(o)(p,q) = H(p,q) 

H(r)(p, q) = - (-F(m)(p,g),..., E^OCp.g),...) 

where r = [«i,.. .,u m ,v i,... ,u/] p or r = [u I( ..., with trees 

i/i G TP p and ^ G TP q . 


Examples of elementary Hamiltonians are 

H(.) = H, H{f) = H q H p , 

H(Y) = H pp (H q , H q ), H{ Y) = fl„), ^(Y) = 

We notice that, in contrast to Sect. IX.9.2, non-vanishing elementary Hamiltonians 
exist for trees with two vertices. 

Lemma 10.6. Elementary Hamiltonians satisfy 

H(u o v)(p,q) + H(v o u) (p, q) = 0 for u G TP p and v G TP qi (10.12) 

and they do not depend on the colour of the root. 

Proof The independence of the colour of the root is by definition, and formula 
(10.12) is proved in the same way as the statement of Lemma 9.6. □ 

The conditions (10.9) and (10.10) define relations between the coefficients b(r) 
of a Hamiltonian vector field (10.4). The previous lemma shows analogous relations 
between elementary Hamiltonians. This motivates the consideration of the following 
equivalence relation on TP (Hairer 1994). 

Definition 10.7. We denote by ~ the smallest equivalence relation on TP which 
satisfies the two properties 

• u ~ v if u and v are identical with the exception of the colour of the root; 

• uov^vouforue TPp and v G TP q . 
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un vv)} vv}} }}yy)} 

Fig. 10.1. Groups of equivalent trees of orders up to three 


Equivalent trees of orders up to three are grouped together in Fig. 10.1. We can 
change the colour of the root, and we can move the root to a neighbouring vertex if 
it has the opposite colour. 

In the case of separable Hamiltonians, one has to consider only trees for which 
neighbouring vertices have different colours. This implies that the first condition of 
Definition 10.7 is empty. The second condition means that the root can be moved ar¬ 
bitrarily in the tree without changing the equivalence class. For this special situation, 
equivalence classes have been considered already by Abia & Sanz-Serna (1993) and 
are named “bicolour (unrooted) trees”. 

Similar to (9.14) we select representatives from the equivalence class as follows: 
we fix a total ordering on the set TP that (i) respects the number of vertices, and 
(ii) is such that no tree is between trees that differ only in the colour of the root. The 
ordering of Fig. 10.1 is such a possible choice. We then define 


r cannot be written as r = u o v with u < v, 1 
also not if the colour of the root is changed. I' 

(10.13) 

We further let TP* = TP* n TP p and TP* = TP* n TP q . 

Lemma 10.8. For a tree r G TP* we have 


TP* = | •, o | U (r G TP 


8fr(T) (M) = (M), 


dq 

dH(r) 

dp 


0~t,#E TP p 

(p,q) = ct(t) 

0~t,#E TP q 


m 

_1\k(t,0) 


(- 1 ) 


(10.14) 




F(6)(p,q), 


where 0) is the number of root changes that are necessary to obtain 0 from r. 

The proof is the same as for Lemma 9.7 and therefore omitted. □ 

We are now able to give the main result of this section. 

Theorem 10.9. Consider a numerical method that can be written as a P-series 
(10.2), and that is symplectic for every Hamiltonian (10.8). Its modified differen¬ 
tial equation is then Hamiltonian with 


H(p, q) = H 1 (p,q)+hH 2 (p,q) + h 2 H 3 (p,q) 


where 

Hj (Pi q) = X ~ 7 —r H(r)(p, q), (10.15) 

z ^ (JIT) 

r E TP* , | t | =j 

and the coefficients b(r) are those of Theorem 10.2. Notice that Hj(p,q) from 
(10.15) is independent of whether we sum over trees in TP* or TP*. 
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Proof. This is the same as for Theorem 9.8. □ 

If the method (10.2) is known to be symplectic for separable Hamiltonians only, 
and if it is applied to H(p, q ) = T(p) + U(q), the statement of Theorem 10.9 is still 
valid. In this situation H(r)(p , q) vanishes if a vertex of r has sons with different 
colour (it then contains a factor H pq ,_ = 0). 

Example 10.10. Consider the 2-stage Lobatto IIIA - IIIB pair (cf. Table II.2.1), 
which is the natural extension of the Stormer-Verlet scheme to non-separable prob¬ 
lems. We compute the coefficients a(r) from Theorem III.2.4, and b(r) from The¬ 
orem 10.2. The result is given in Table 10.2. Notice that a(r) and b(r) are both 
independent of the colour of the root. Theorem 10.9 then yields 

H = H+^ (2 H n H\ - H qq Hp + 2H pq HqH p S j + ... (10.16) 

for the modified Hamiltonian. Since the method is symmetric, H is in even powers 
of h. The next non-vanishing term requires the consideration of trees up to order 5. 


Table 10.2. Coefficients a(r) and b(r) for the Stormer-Verlet scheme (Table II.2.1) 


T 

* / 

/ 

V 

Y 

v 

) 

} 

> > 

a(r) 

1 1/2 

1/2 

1/2 

1/4 

1/4 

1/4 

1/4 

0 1/4 

6(r) 

1 0 

0 

1/6 

- 1/12 

- 1/12 

1/12 

1/12 

- 1/6 1/12 


Remark 9.9, the characterization of symplectic vector fields (10.4), and the re¬ 
sults of Sect. IX.9.4 can be extended to the case of (partitioned) P-series. We re¬ 
nounce of giving all the details here. 


IX. 11 Exercises 

1. Change the Maple program of Example 1.1 in such a way that the modified 
equations for the implicit Euler method, the implicit midpoint rule, or the trape¬ 
zoidal rule are obtained. Observe that for symmetric methods one gets expan¬ 
sions in even powers of h. 

2. Write a short Maple program which, for simple methods such as the symplec¬ 
tic Euler method, computes some terms of the modified equation for a two- 
dimensional system p = /(p, q), q = g(p , q). Check the modified equations of 
Example 1.3. 

3. Prove that the modified equation of the Stormer-Verlet scheme (1.1.15) applied 
to y = g(y ) is a second order differential equation of the form y = gh(y , y) 
with initial values given by y( 0) = y$ and y( 0) such that y(h) = y\ holds. 
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Hint. Taylor expansion shows that for a smooth function y{t) satisfying y(t) = 
y n we have 

^ + Y 2 D 2 + ^ + ..)m=g(m), 

where D represents differentiation with respect to time. 

Warning. In general, we do not have that y(t n ) = y n . 

4. Prove that for p-reversible differential equations the elementary differentials 
satisfy 

F{r)(py) = 

Use this to give an alternative proof of Theorem 2.3 for the case that the method 
is symmetric and can be expressed as a 5-series. 

5. Find a first integral of the truncated modified equation for the symplectic Euler 
method and the Lotka-Volterra problem (Example 1.3). 

Hint. With the transformation p = exp P, q = exp Q you will get a Hamil¬ 
tonian system. 

Result. I(p , q) = 7(p, q) — h(fp + q) 2 — 8p — 10 q + 2 In p + 81ng)/4. 

6. (Field & Nijhoff 2003). Apply the symplectic Euler method to the system with 
Hamiltonian 77(p, q ) = In (a + p) + In (/3 + q). Compute the modified Hamil¬ 
tonian and prove that the series converges for sufficiently small step sizes. 

Hint. The method conserves exactly 7(p, q) = (a + p)(/3 + q). Find linear two- 
term recursions for {p n } and { q n }, and use the ideas of Example 1.4. Result. 

= T. ^lk + 1) ■ 

7. Compute <%c(r) for the tree r = [[r], r] of order 4. 

8. For the implicit midpoint rule compute the coefficients a(r) of the expansion 
(9.1), and also a few coefficients b(r) of the modified equation. 

Result. a(r) = 2 1- l r l, b( •) = 1, b(f) = 0, b(r) = a(r) — l/ 7 (r) for 

l T l = 3. 

9. Check the formulas of Table 9.1. 

10. Consider a differential equation y = f(y) with a divergence-free vector field, 
and apply a volume-preserving integrator. Show that every truncation of the 
modified equation has again a divergence-free vector field. 

Hint. Adapt the proof by induction of Theorems 2.3 and 3.1. 

11. Consider explicit 2-stage Runge-Kutta methods of order 2, applied to the pen¬ 
dulum problem q = p, p = — sin q. With the help of Exercise 2 compute 
fs (p, q) of the modified differential equation. Is there a choice of the free para¬ 
meter C 2 , such that fs (p, q) is a Hamiltonian vector field? 

12. Find at least two linear transformations p for which the Kepler problem (1.2.2), 
written as a first order system, is p-reversible. 

13. Consider the Kepler problem (1.2.2), written as a Hamiltonian system (1.1.10). 
Find constants M and R such that (7.2) holds for all (p, q) G M 4 satisfying 

||p|| < 2 and 0.8 < ||g|| < 1.2. 
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14. (McLachlan & Zanna 2005). Consider the rattle method (Algorithm VII.5.1) 
applied to the Euler equations (VII.5.10) of the free rigid body, written as 
y = f(y). Prove that the modified differential equation is of the form 

y= (1 + h 2 s 2 (y) + h 4 s A (y) + ...) f(y), (11.1) 

where the scalar functions Sk(y) depend on y only via the Casimir function 
C(y) =yl+yl+yi and the Hamiltonian H(y) = \{y\/h+yl/ h+vl/h)- 
Consequently, all Sk(y) are constant along solutions of the Euler equations. 
Hint. Since C(y) and H(y) are exactly conserved by the numerical method 
(see Sect. VII.5.3), the modified equation is a time transformation of the origi¬ 
nal system. The special form of the functions (y) follows from the fact that 
RATTLE is a Poisson integrator (Theorem VII.5.11) and from a transformation 
to canonical form as in Theorem 3.5. 

15. (Murua 1999). Let ^h{y) = B(a,y) be given by a B-series and denote with 
b(r) the coefficients of the corresponding modified differential equation, cf. 
formula (9.4). Prove that the coefficients of the nth iterate ^(y) = B(a n , y) 
satisfy 

a n (r) = n b(r) + n 2 c(r, n) for r G T, 
where c(r, n) is a polynomial of degree \r\ — 2 in n. 

Hint. This follows from the Taylor series y(nh) = y( 0) + nhy'{ 0) + ... for the 
solution of the modified differential equation. 

16. With the help of Exercise 15, give an alternative proof of Theorem 9.3. 

Hint. If B(a,y) is symplectic, also B(a n ,y) is symplectic and its coefficients 
thus satisfy (VI.7.4). 

17. (Murua 1997). Find a one-to-one correspondence between the equivalence 
classes of TP (corresponding to ~ of Definition 10.7) and oriented free trees 
(i.e., trees without a distinguished vertex (root), but with oriented edges), see 
Fig. 11.1. 


/ V V V 

VVVVVVVv 


Fig. 11.1. Oriented free trees up to order four 



Chapter X. 

Hamiltonian Perturbation Theory and 
Symplectic Integrators 


Perturbation theory is in fact an outgrowth of the necessity to determine 
the orbits with ever greater accuracy. This problem can be solved today, 
but in what is for the theoretician a rather disappointing way. With mod¬ 
ern calculating machines, one is now able to compute directly results even 
more accurately than those provided by perturbation theory. 

(J. Moser 1978) 

... allows computer prediction of planetary positions far more accurate 
(by brute computation) than anything provided by classical perturbation 
theory. In a very real sense, one of the most exhalted of human endeavors, 
going back to the priests of Babylon and before, has been taken over by 
the machine. (S. Sternberg 1969) 


In this chapter we study the long-time behaviour of symplectic integrators, combin¬ 
ing backward error analysis and the perturbation theory of integrable Hamiltonian 
systems. 


modified 

problem 


original 

problem 


backward 
^ erro^anal. 



numerical 

solution 

I 

! error ? 

Y 

exact 

solution 


perturbation 
y theory 


1 9 

I error ? 

I 


approximate 

problem 


approximate 

solution 


During the 18th and 19th centuries, scientists struggled for the integration of com¬ 
plicated problems of dynamics, with the main aim of solving them analytically by 
“quadrature”. But only few problems could be treated successfully in this way. In 
cases where the original problem could not be solved, much effort was put into re- 
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placing it by an integrable approximate problem , by using and developing perturba¬ 
tion theory. Thereby, a rich arsenal of very ingenuous theories has been discovered 
since the 19th century. 

In the 1960s and 1970s, the enormous progress of “calculating machines” and 
numerical software allowed many of the original problems to be solved with extreme 
accuracy, so that for the first time numerical integration methods superseded analyt¬ 
ical perturbation methods in the computations of celestial mechanics (see the above 
citations). Since then, the further increase in computing speed has allowed problems 
to be treated on larger and larger time scales, where huge amounts of errors are ac¬ 
cumulated and need to be understood and controlled. In the spirit of backward error 
analysis, these numerical errors are interpreted as those of a modified problem , for 
the study of which perturbation theory is once again the appropriate tool. 


X.l Completely Integrable Hamiltonian Systems 

Integrable Hamiltonian systems were originally of interest because their equations 
of motion can be solved analytically. Their interest in the present context lies in the 
fact that their flow is simply uniform motion on a Cartesian product of circles and 
straight lines in suitable coordinates, and that many physical systems can be viewed 
as perturbations of integrable systems. 


X.l.l Local Integration by Quadrature 

M. Liouville a fait voir qu’il fallait que toutes les combinaisons (a, /3) 

des integrates trouvees fussent nulles. (E. Bour 1855) 

One of the great dreams of 18th and 19th century analytical mechanics was to solve 
the equations of motion of mechanical systems by “quadrature”, that is, using only 
evaluations and inversions of functions and calculating integrals of known functions. 
In this spirit, Newton’s (1687) equations of motion of Kepler’s two-body problem 
were solved by Joh. Bernoulli (1710) and Newton (1713), see Sect. 1.2.2. Euler’s 
(1760) solution of the problem of the attraction of a particle by two fixed centres, 
and Lagrange’s (1766) study of motion of a particle in a field with one attracting 
centre and under an additional constant force were among the important achieve¬ 
ments of the 18th century. The three-body problem, however, resisted all efforts 
aiming at an integration by quadrature, and though it continued to do so, this prob¬ 
lem spurred the development of extremely useful mathematical theories of a much 
wider scope throughout the 19th century, from Poisson to Poincare via Hamilton, Ja¬ 
cobi, Liouville, to name but a few of the most eminent mathematicians contributing 
to analytical mechanics. 

Consider the Hamiltonian system 


dH .8H 

P=~^(p,q), q=-^(p,q), 


( 1 . 1 ) 
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with d degrees of freedom: (p, q) G x R d . We try to find a symplectic transfor¬ 
mation (p, q) i—^ (x,y), such that the system has a more amenable form in the new 
coordinates. In particular, this is the case if the Hamiltonian expressed in the new 
variables, 

H(p,q) = K(pc) , (1.2) 

does not depend on y. Since = 0, the transformed system then becomes (re¬ 
call the conservation of the Hamiltonian form of the differential equations under 
symplectic transformations, Theorem VI.2.8) 

x = 0, y = uj(x ), (1.3) 

with l o{x) = ^r(x). This is readily integrated: 

x(t) = x 0 , y(t) = yo + cj(x 0 )t • 

As we recall from Sect. VI.5, a symplectic transformation (p, g) (x,p) can be 
constructed via a generating function S(x, q ) by the equations 

95, , 95, , 

pis^(£,g), p=—(x,q). (1.4) 

If (p 0 , go) and (xo, po) are related by (1.4), and if d 2 S/dxdq is invertible at (xo, go), 
then the equations (1.4) define a symplectic transformation between neighbourhoods 

of (po,Qo) and (x 0 ,y 0 ). 

The equation (1.2) together with the second equation of (1.4) give a partial dif¬ 
ferential equation for S, the Hamilton-Jacobi equation 

= K(x) . 

If S(x, q) is a solution of such an equation (for some function K ), then (1.3) shows 
that Xi = Ti(p, q) (i = 1,..., d) as given implicitly by the second equation of (1.4), 
are first integrals of the Hamiltonian system (1.1). Moreover, these functions Fi are 
in involution , which means that their Poisson brackets vanish pairwise: 

{Fi,Fj} = 0, i,j = l,...,d. 

This is an immediate consequence of the definition {F, 6'} = VF T J _1 VG of the 
Poisson bracket and of the symplecticity of the transformation (the left upper block 
of J -1 is 0). 

Conversely, it was realized by Bour (1855) and Liouville (1855) that a Hamil¬ 
tonian system having d first integrals in involution can locally be transformed to the 
form (1.3) by “quadrature”. This observation is based on the following completion 
result and its proof. 
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Lemma 1.1 (Liouville Lemma). Let Fi,..., F^ be smooth real-valued functions, 
defined in a neighbourhood of (po,qo) G x M d . Suppose that these functions 
are in involution (i.e., all Poisson brackets {F^, Fj} = 0), and that their gradients 
are linearly independent at (po, Qo)- Then, there exist smooth functions Gi,..., Gd, 
defined on some neighbourhood of (po, qf), such that 

(Fi ,... ,Fd,G i, • • •, Gd) • (p, q) (x, y ) is a symplectic transformation. 

Proof Let F = (Fi,..., F^) T . The linear independence of the gradients VF* im¬ 
plies that there are d columns of the dx2d Jacobian OF/d(p , q ) that form an invert¬ 
ible d x d submatrix. After some suitable symplectic transformations (see Exercise 1) 
we may assume without loss of generality that F p = dF/dp is invertible. By the 
implicit function theorem, we can then locally solve x = F(p, q) for p: 

p = P(x, q) with partial derivatives P x = F^ 1 , P q = —F~ 1 F q . 

The condition that the Fi are in involution, reads in matrix notation 

FpF q ~ PqPp = 0 • 

Multiplying this equation with F~ x from the left and with F~ T from the right, we 
obtain 

~Pq + Pq = 0 5 

so that P q = dP/dq is symmetric. By the Integrability Lemma VI.2.7, P(x,q) 
is thus locally the gradient with respect to q of some function S(x,q) (which is 
constructed by quadrature). Moreover, = P x = F~ x is invertible. The equa¬ 
tions (1.4) define a symplectic transformation (p, q) (x,y), and by construction 

x = F(p, q). □ 

If, in a Hamiltonian system with d degrees of freedom, we can find d inde¬ 
pendent first integrals in involution H = Fi, F 2 ,..., F^, then Lemma 1.1 yields 
a symplectic change of coordinates, constructed by quadrature, which transforms 
( 1 . 1 ) locally to ( 1 . 2 ) with K(xi ,..., Xd) = x\. 

Example 1.2. Consider the Hamiltonian of motion in a central field, 

H = +p\) + V(r) for r = fq\ + ~q\ , 

with a potential V (r ) that is defined and smooth for r > 0. The Kepler problem 
corresponds to the special case V(r) = — 1/r, and the perturbed Kepler problem to 
V(r ) = —1/r — p/ (3r 3 ). Changing to polar coordinates (see Example VI.5.2) 

/VA = {rcosp\ (Pr\ = f COSi P sin(^\/pi\ 

\Q 2 ) \ r sin <p J ’ \Pp ) \-r sirup rcoscp) \p 2 ) ’ 

this becomes 
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2 

H{p r ,p v ,r,<p) = ^[p 2 r + ^|) + V(r) . 

The system has the angular momentum L = p^ as a first integral, since H does 
not depend on ip. Clearly, { JT, L} = 0 everywhere. The gradients of H and L 
are linearly independent unless both p r = 0 and p 2 = r 3 V'(r). By inserting p 2 = 
2 r 2 (H—V (r)) and eliminating r this becomes a condition of the form a(H , L) = 0, 
which for the Kepler problem reads explicitly L 2 (l + 2 HL 2 ) = 0. The conditions 
of Lemma 1.1 are thus satisfied on the domain 


M = {(pr,P v ,r,tp) ; r > 0, a(H, L) ^ 0} . 

The equations x\ = H = \{p 2 + p^/r 2 ) + V(r),x 2 = L = p v can be solved for 

p r = ±v/2(if-y(r))-L 2 A 2 , P V = L, 


andp r = dS/dr,p^ = dS/dp with 

S(H, L, r, (p) = Lp ± f - F(p)) - L 2 /^ 2 dp . 

Jr 0 


The conjugate variables are 


y 1 


V2 


dl = + f r _1_ , 

dH J ro ^2(77 - V(j>)) - lyp 

as _ in ^ p _ l/p_ _ 

dL ~ * T Jro V2(H - v(p)) ~ Wp 2 


( 1 . 6 ) 


This defines (locally) the transformation (p r ,_p^, r, (p) i— > (xi, yi, 2/2) • In these 
variables, the equations of motion read x\ — 0, X 2 = 0, y\ = 1 , 7/2 = 0. Over any 
time interval where p r (t) does not change sign, solutions therefore satisfy 


ti — to = ± 
p(ti) - <p(t 0 ) = ± 



- vW - Wp 2 p ’ 
L/p rl 
PW - VW) - Wp 2 p ' 


(1.7) 


X.1.2 Completely Integrable Systems 

Lemma 1.1 appears as a powerful tool for an explicit solution by quadrature. How¬ 
ever, because of its purely local nature this lemma does not tell us anything about 
the dynamics of the system. This was not a concern at Liouville’s time, but the first 
rigorous non-integrability results by Poincare (1892) put a definite end to the hope 
of being eventually able to construct explicit analytic solutions of most equations of 
motion by quadrature, and shifted the interest to understanding the global , qualita¬ 
tive behaviour of dynamical systems. 

Lemma 1.1 can be globalized by a procedure similar to analytic continuation if 
the conditions of the following definition are satisfied. 
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Definition 1.3. A Hamiltonian system with Hamiltonian H : M —» M (M an open 
subset of x R d ) is called completely integrable if there exist smooth functions 
Fi 0 H, F 2 .,... i Fd : M —» M with the following properties: 

1) Fi,..., Fd are in involution (i.e., all {Fi, Fj} = 0) on M. 

2) The gradients of F \,..., Fd are linearly independent at every point of M. 

3) The solution trajectories of the Hamiltonian systems with Hamiltonian Fi (i = 
1,..., d) exist for all times and remain in M. 

Obviously, all the Hamiltonian systems with Hamiltonian Fi (i = 1,..., d) are 
then completely integrable, and so there will be no mathematical reason to further 
distinguish H = F\. We note that condition (1) of Definition 1.3 implies that all Fj 
are first integrals of the Hamiltonian system with Hamiltonian Fi, and that the flows 
ipfi of these Hamiltonian systems commute: pft o = p^f o pft for all i, j and 
all see Lemma VII.3.2. 

For x = (xf) G we define the level set 

M x = {(p, q) e M ; Fi(p,q) = Xi for i = 1,... ,d}. (1.8) 

Theorem 1.4. Suppose that F \,..., Fd : M —> M satisfy the conditions of Defini¬ 
tion 1.3. Assume that M x is connected (and non-empty) for all x in a neighbourhood 
of xo G Then, on some neighbourhood B of x$, there exists a symplectic and 
surjective mapping 

e : B x R d -»• |J M x : (x,y) >-> (p, q) e M x 

xeB 


that linearizes, for all i = 1,..., d, the flow pfi of the system with Hamiltonian F,: 

if (p,q) = e(x,y), then Pt\p,q) = e{x,y + tei), (1.9) 

where e* = (0,..., 1,..., 0 ) T is the ith unit vector ofR d . 

Since e is symplectic, e is a local diffeomorphism. Its local inverse is a trans¬ 
formation as constructed in Lemma 1.1. However, (p, q) can have countably many 
discretely lying pre-images (x,y), so that e _1 becomes a multi-valued function. 
The situation is analogous to that of the complex exponential and logarithm. The 
following example illustrates that this analogy is not incidental. 

Example 1.5. Consider the harmonic oscillator, i.e., d = 1 and H(p, q ) = \(p 2 F 
q 2 ). For x = \r 2 , we have e(x, y ) = (r cosp, r sin y). 

Proof of Theorem 1.4. We fix (po, go) £ M Xo , and in a neighbourhood U of (po, go) 
we consider a symplectic transformation 

£= (F 1 ,...,F d ,G 1 ,...,G d ) : (p,g) ^ (x,y) 

as constructed in Lemma 1.1. We have £(po, go) = (^o 5 Po) where we may assume 
Po = 0. To every v = (vf) G we associate the Hamiltonian 
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F v — v\F\ + ... + VdFd 

and note that, because of the commutativity of the flows J , the flow of the system 
with Hamiltonian F v equals 

[1] [d] 

Vtv = <Ptv\ ° • • • ° Vtv d • 

In the neighbourhood U of (po, Qo), the system with Hamiltonian F v is transformed 
under the symplectic mapping i to 

x = 0, y — v . 

Hence, the following diagram commutes for (p, q) G U and for sufficiently small tv: 

< Ptv(p,q ) 

t~ l (1.10) 

(x, y + tv) 

We now construct e by extending this diagram to arbitrary tv: 

(p,q) 

V 1 

(x, 0) 

That is, we define on B x I d (with B a neighbourhood of xq on which £~ 1 (x, 0) is 
defined) 

e(x,y) = ip y (£~ 1 (x,0)) . 

For (x, y ) near some fixed (x, y), we have by (1.10) with y — y and y instead of y 
and tv that 

e(x,y) = ^(e~ 1 (x,y- y)) , 

which shows that e is symplectic, being locally the composition of symplectic trans¬ 
formations. The property (1.9) is obvious from the definition of e and from the com¬ 
mutativity of the flows ipft. Since £~ 1 (x,0) G M x and M x is invariant under the 
flows (pP, we have e(x, y) G M x for all (x, y). 

It remains to show that e : {x} x M. d —> M x is surjective for every x near 
xo. Let (p, q) be an arbitrary point on M x . By assumption, there exists a path on 
M x connecting t~ x (x, 0) and (p, q). Moreover, by (1.10) and by the compactness 
of the path, there is a S > 0 such that, for every (p, q) on this path, the mapping 
y i * (f y (p,q) is a diffeomorphism between the ball ||p|| < S and a neighbour¬ 
hood of (p, q) on M x . Therefore, (p, q) can be reached from £~ 1 (x 1 0) by a finite 
composition of maps: 

(p,q) = <p y( m) o ... o (p y (i) (f —1 0)) = ^{£~ l {x, 0)) = e(x,y) , 

where y = y {1> + ... + y { m) once again by the commutativity of the flows . □ 


Vy(p, q) 


(x,y) 


(l.ii) 




396 X. Hamiltonian Perturbation Theory and Symplectic Integrators 


Illustration of the Liouville Transform. We illustrate the above construction at a 
simple example, the pendulum (1.1.12) with Hamiltonian H = p 2 / 2 — cos q. The 
first coordinate is x = H(p 1 q), a first integral. The second coordinate y is, following 
(1.11), the time t which is necessary to reach the point (p, q) from an initial line, 
which we assume at q = 0. Then we have (Fig. 1.1 left) dpdq = dH dt (because 




\x = H (B) 

a 

« 

dH 

-1 ^ da 

- 1 —.>► ► 


1 — ► 

1 —^ 


dt 

dO 



y=t 


0 


Fig. 1.1. Liouville and action-angle coordinate transforms 


of dq = H p dt and dH = H p dp). We see again that we have area preservation, 
because the symplecticity of the flow preserves this property for all times. This 
symplectic change of coordinates (p, g) i—» (x,y) is illustrated in Fig. 1.2, which 
transforms the problem (A) to a much simpler form (B) with uniform horizontal 
movement. 



Fig. 1.2. Liouville and action-angle coordinates illustrated at the pendulum problem 


We are not yet completely satisfied, however, because the orbits have periods 
g = g(H) which are not all the same. We therefore append a second transform by 
putting 0 — ^ -t (see picture (C) in Fig. 1.1 and Fig. 1.2), which forces all periods 
into a Procrustean bed of length 2i r. Area preservation da dO = dH dt now requires 
that 27 t da = g(H) dH , which is a differential equation between a and H. The new 
coordinates (a, 6) are the action-angle variables and we see that they transform the 
phase space into D x T 1 where D c M 1 . We again have horizontal movement, but 
this time the speed depends on a. The general existence for completely integrable 
systems will be proved in Theorem 1.6 below. 
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X.l.3 Action-Angle Variables 

We show here that, under the hypotheses of Liouville’s theorem, we can 
find symplectic coordinates (I, <p) such that the first integrals F depend 
only on I, and <p are angular coordinates on the torus Mf. 

(V.I. Arnold 1989, p. 279) 

We are now in the position to prove the main result of this section, which establishes 
a symplectic change of coordinates to the so-called action-angle variables , such that 
d first integrals of a completely integrable system depend only on the actions, and 
the angles are defined globally mod 2i r (provided the level sets of the first integrals 
are compact). This is known as the Arnold-Liouville theorem; cf. Arnold (1963, 
1989), Arnold, Kozlov & Neishtadt (1997; Ch. 4, Sect. 2.1), Jost (1968). Here and 
in the following, 

T d = R d /27rZ d = {(<9i mod 2tt, ..., 9 d mod 2tt) ; e 1} 

denotes the standard d-dimensional torus. 

Theorem 1.6 (Arnold-Liouville Theorem). Let F\,... . F ( i : M —> M be first 
integrals of a completely integrable system as in Definition 1.3. Suppose that the 
level sets M x (see (1.8)) are compact and connected for all x in a neighbourhood 
of xq G R d . Then, there are neighbourhoods B of xo and D of 0 in R d such that 
the following holds: 

(i) For every x G B, the level set M x is a d-dimensional torus that is invariant 
under the flow of the system with Hamiltonian Fi (i = 1,..., d). 

(ii) There exists a bijective symplectic transformation 

ip : D x T d -► (J M x C R d x R d : (a, 9) i—► (p, q) 

xEB 

such that (Fi o fi>)(a, 6) depends only on a, i.e., 

Fi(p,q) = fi{a) for (p,q) = ip(a,Q) (i = l,...,d) 

with functions fi : D —> M. 

The variables (a, 6) = (ai,..., a^, 0\ mod 27 t, mod 27r) are called 

action-angle variables. 

Remark 1.7. If the level sets M x are not compact, then the proof of Theorem 1.6 
shows that M x is diffeomorphic to a Cartesian product of circles and straight lines 
T k x R d ~ k for some k < d, and there is a bijective symplectic transformation 
(a, 6) i—^ (p, q) between D x ( T k x R d ~ k ) and a neighbourhood (J {M x : x G B} 
of M Xo such that the first integrals again depend only on a. 

Remark 1.8. If the Hamiltonian is real-analytic, then the proof shows that also the 
transformation to action-angle variables is real-analytic. 
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Proof of Theorem 1.6. (a) We return to Theorem 1.4. For x G B, we consider the 
set 

r x = {y e R d ; e(x, y) = e(x, 0)} . 

Since e is locally a diffeomorphism, for every fixed yo G T Xo there exists a unique 
smooth function rj defined on a neighbourhood of xo, such that y(xf) = yo and 
7y(x) G r x for x near xo. In particular, T x is a discrete subset of R d . By (1.9), 
for y e T x we have e(x, y + v) = e(x , u) for all u G Therefore, F x is a 
subgroup of i.e., with t/, v G F x also y + v G F x and —y G 7^. It then follows 

(see Exercise 4) that F x is a grid, generated by k < d linearly independent vectors 
9 \{x),...,g k (x) e 


r x = {migi(x) + ... + m k g k {x ) ; m* e Z} . 

We extend gi{x),... ,g k (x) to a basis gi(x),... ,g<i(x) of Then, e induces a 
diffeomorphism 


T fc x R d~k 
(^1, . . . * 0 k , T k - )_i, . . . , Td) 


M x 

k q d 

e ( x ’F^ gi W + F T iSj(x)y 

i= 1 j=/c+l 


If M x is compact, then necessarily k — d and M x is a torus. The above map then 
becomes the bijection 


T d —> M x - 


9i 


(b) Next we show that gi(x) is the gradient of some function Ui(x). For nota- 
tional convenience, we omit the subscript i and consider a differentiable function g 
with 

e(x,g(x)) = e(x,0) , ig5, 

or equivalently, 

t o e(x, g(x)) = (x, 0) , x G B . 

Differentiating this relation gives (with I the d-dimensional identity) 


g'(x) 


where A is the Jacobian matrix of £ o e at (x, g(x)). We thus have 


(/ g\xf)A- JA ( g / (x) ) = (I 0)j( J)=0. 

Since ^ o e is a symplectic transformation, we have A T JA = J, and hence the 
above equation reduces to 





X.l Completely Integrable Hamiltonian Systems 399 


g\x) T -g'(x)= 0. 

By the Integrability Lemma VI.2.7, there is a function U such that g{pc) = VC/(#). 
We may assume U (#o) = 0. 

(c) The result of (b) allows us to extend the bijection of (a) to a symplectic 
transformation. For this, we consider the generating function 

S{x,0) = Y^Ui(x) . 


With u(x) = (Ui(x),... ,Ud(x)), the mixed second derivative of S is 

S x e{x,6) = /%(x) = T{g 1 (x),...,g d {x) s j , 

which is invertible because of the linear independence of the gi. The equations 



define a bijective symplectic transformation (for some neighbourhood D of 0, and 
possibly with a reduced neighbourhood B of xf) 

f3 : D x l d B x : (a, 9) ~ (x,y) = (/(a), £ L 9i (f( a ))) 

where x = f(a ) is the inverse map of a = ^u(x). We now define 

^eo^fixR^ (J M x . 

xEB 

By construction, this map is smooth and symplectic, and such that fi(a) = Xi = 
Fi(p,q) for (p,q) = ip(a,Q). It is surjective by Theorem 1.4. By part (a) of this 
proof, it becomes injective when the 0i are taken mod 2i r, thus yielding a transfor¬ 
mation defined on D x T d with the stated properties. □ 


X.1.4 Conditionally Periodic Flows 

An immediate and important consequence of Theorem 1.6 is the following. 

Corollary 1.9. In the situation of Theorem 1.6, consider the completely integrable 
system with Hamiltonian H = F\. In the action-angle variables (a, 6), the Hamil¬ 
tonian equations become 

hi = 0, 0i = cji(a) (i = 1 ,.. ,, d) 

withuji(a) = dK/dai(a), where K(a) = H(p,q) for (p,q) =f>(a,6). 
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The flow of a differential system 

6 = u) , uo = (oq) G 

on the toms T d is called conditionally peri¬ 
odic with frequencies c <q. The flow is peri¬ 
odic if there exist integers ki such that for 
any two frequencies the relation oq/oq = 
ki/kj holds. Otherwise, the flow is called 
quasi-periodic. In particular, the latter oc¬ 
curs when the frequencies are rationally independent, or non-resonant : the only 
integers ki with k\uj\ + ... + k^d = 0 are k\ = ... = kd = 0. For non¬ 
resonant frequencies, it is well known (see Arnold (1989), p. 287) that every tra¬ 
jectory {6(t) : t G M} is dense on the torus T d and uniformly distributed. 



Example 1.10. We take up again the example of motion in a central field, Exam¬ 
ple 1.2. For given H and L, we now assume that 

{r > 0 ; 2 (H - V (r)) - L 2 /r 2 > 0} = [r 0 , n] 


is a non-empty interval and the derivatives of 2 (H — V(r)) — L 2 jr 2 are non¬ 
vanishing at ro, r\. By (1.7), the motion from ro to r\ and back again takes a time 
T and mns through an angle <P which are given by 


T 

<P 



1 

V2 (H - V(p)) 

L/P 2 

G2 (H - V(p)) - WP 


dp , 


dp . 


( 1 . 12 ) 

(1.13) 


Note that ro, ri, T, are functions of H and L. The solution is periodic if ^ is 
a rational multiple of 27r. This occurs for the Kepler problem, where = 2tt and 
where T = 2tt/(— 277) 3 / 2 (for H < 0) depends only on H\ see Exercise 1.5. 

We now construct action-angle variables and compute the frequencies of the 
system. We begin by constructing the mapping e(x, y ) as defined by (1.11) for the 
variables x = (a^aq) = (H,L) and y = ( 2/1 , 2 / 2 ) of (1.6). For a given (x,y), 
we consider (x, 0 ) and we fix (p,q) with p = (p r ,p<r) and q = (r, 9 ?) such that 
^(p, q) = (a;, 0), e.g., by choosing r = ro, 99 = 0, p r = 0, p^ = L. The mapping 
e(x,y) is defined by the flow at time t = 1 corresponding to the Hamiltonian 


F y = y\H + y 2 L = yi[\{p 2 r + P%/r 2 ) + F(r)) + y 2 p v , 


i.e., by the solution at t = 1 of 


p; 


Pr = -2/1 -f - 2/1 , Per = 0 


P : 


P<r , 

yi + P2 • 


r = PlPr 5 


(1.14) 
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If we denote the flow of the original system with Hamiltonian H(p r ,p lf , r, ip) by 
( p t , then we have 


e(x, y) = ip yi (0, L, r 0 ,0) + (0,0,0, y 2 ) T 

with the last component taken modulo 27r. Hence, the values of y satisfying 

e(x, y ) = e(x, 0) are 

y = m\gi(x) + m 2 g 2 {x) 

with integers mi, m 2 and 



We know from the proof of Theorem 1.6 that g\ and g 2 are the gradients of functions 
Ui(H, L) and U 2 (H, L), respectively. Clearly, U 2 = 27t L. The expression for U\ is 
less explicit. With the construction of the Integrability Lemma VI.2.7, this function 
is obtained by quadrature, in a neighbourhood of (Hq, Lq), as 


UML) 


[ ((H-H 0 )T(Ho + s(H-Ho),L 0 + s(L-L 0 ))- 

Jo V 

(L - To) <P(H 0 + s{H - H 0 ), L 0 + s(L - L 0 )))ds . 


(For the Kepler problem, T = 27r/(—2i7) 3 / 2 , ^ = 0 mod 27t, and hence U\ = 
27r/V-2i7.) For the action variables we thus obtain 

ai = — U\(H, L) , ci2 = L . 

Z7T 

The angle variables are given by y = ^(0\gi + # 2 ^ 2 ), he., 

27r 

0| = /yi y . 0 2 = y 2 + y 1 -. (1-15) 

Writing the total energy = K(a-\, L) if a\ is given by the above formula, we 
obtain, by differentiation of the identity 27rai = U-\ (K(a -\, L). L), 

_dUidK_ _ dlh dK_ &Ui 

' K ~~dHd^i' ~~dHd^2 + ~dL 

and hence the frequencies 

_ OK _ 2tt _ dK _ <I> 

’ U>2 ~d^~T' 


(U6) 
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X.1.5 The Toda Lattice - an Integrable System 

Our method is based on the realization that the Toda lattice belongs to 
a class of evolution equations which can be studied, and in some cases 
solved, by utilization of a certain associated eigenvalue problem. 

(H. Flaschka 1974) 

Classical examples of integrable systems from mechanics include Kepler’s problem 
(Newton 1687/1713, Joh. Bernoulli 1710), the planar motion of a point mass at¬ 
tracted by two fixed centres (Euler 1760), Kepler’s problem in a homogeneous force 
field (Lagrange 1766 solved this as the limit of the previous problem when one cen¬ 
tre is at infinity), various spinning tops (Euler 1758b, Lagrange 1788, Kovalevskaya 
1889, Goryachev 1899 and Chaplygin 1901), a number of integrable cases of the 
motion of a rigid body in a fluid, the motion of point vortices in the plane. We refer 
to Arnold, Kozlov & Neishtadt (1997) and Kozlov (1983) for interesting accounts 
of these problems and for further references. 

Here we consider the celebrated example of the Toda lattice which was the start¬ 
ing point for a huge amount of work on integrable systems in the last few decades, 
with fascinating relationships to soliton theory in partial differential equations (most 
notably the Korteweg-de Vries equation) and to eigenvalue algorithms of Numerical 
Analysis; see Deift (1996) for an account of these developments. 

The Toda lattice (or chain) is a system of particles on a line interacting pairwise 
with exponential forces. Such systems were studied by Toda (1970) as discrete mod¬ 
els for nonlinear wave propagation. The motion is determined by the Hamiltonian 

n 

H(p,q) = + ex P(<7fc - <Zfc+i)) • (1-17) 

k= 1 

Two types of boundary conditions have found particular attention in the literature: 

(i) periodic boundary conditions: q n+ 1 = qt; 

(ii) put formally g n+ i = +oo, so that the term exp(g n — q n +i) does not appear. 
It was found by Henon, Flaschka and independently Manakov in 1974 that the pe¬ 
riodic Toda system is integrable. Moser (1975) then gave a detailed study of the 
non-periodic case (ii). 

Flaschka (1974) introduced new variables 

a k = ~\Pk , b k = \ exp (\(q k - q k +i)) ■ 

(Take b n = 0 in case (ii)). Along a solution (p(t),q(t)) of the Toda system, the 
corresponding functions (a(t), b(t)) satisfy the differential equations 

a k = 2(b 2 k -b 2 k _ i) , b k = b k {a k+1 - a k ) 

(with a n+ 1 = a\ in case (i), b n = 0 in case (ii)). With the matrices 
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L 


B = B(L ) 


fcLi h b n \ 

bi a 2 b 2 0 

b 2 C&3 ^3 


0 

\b n 

( 0 h 
-h 0 
- 6 2 


bn—2 Un—l b n — 1 I 
bn— 1 / 

bn \ 

b 2 0 

0 63 


V b n 


0 


bn —2 0 b n — 1 I 

-6 n _i 0 / 


the differential equations can be written in the Lax pair form 


L = BL-LB. (1.18) 

This system has an isospectral flow, that is, along any solution L(t) of (1.18) the 
eigenvalues do not depend on t; see Lemma IV.3.4. The eigenvalues Ai,..., A n of 
L are therefore first integrals of the Toda system. They are independent and turn out 
to be in involution, in a neighbourhood of every point where the A i are all differ¬ 
ent; see Exercise 6 . Hence, the Toda lattice is a completely integrable system. Its 
Hamiltonian can be written as 

n n 

H = X ( 2a fe + 46 0 = 2 traceL 2 = . 

/c=l i=l 

We conclude this section with a numerical example for the periodic Toda lattice. 
We choose n = 3 and the initial conditions p\ — —1.5, p 2 = 1, P 3 = 0.5 and 
Qi = 1, #2 = 2, qs = —1. We apply to the system with Hamiltonian (1.17) the 
symplectic second-order Stormer-Verlet method and the non-symplectic classical 
fourth-order Runge-Kutta method with two different step sizes. The left pictures of 
Fig. 1.3 show the numerical approximations to the eigenvalues, and the right pictures 
the deviations of the eigenvalues Ai, A 2 , A 3 along the numerical solution from their 
initial values. Clearly, the eigenvalues are not invariants of the numerical schemes. 
However, Fig. 1.3 illustrates that the eigenvalues along the numerical solution re¬ 
main close to their correct values over very long time intervals for the symplectic 
method, whereas they drift off for the non-symplectic method. 

An explanation of the long-time near-preservation of the first integrals of com¬ 
pletely integrable systems by symplectic methods will be given in the following 
sections, using backward error analysis and the perturbation theory for integrable 
Hamiltonian systems. 
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Stormer/Verlet method 



classical Runge-Kutta method of order 4 



Fig. 1.3. Numerically obtained eigenvalues (left pictures) and errors in the eigenvalues (right 
pictures) for the step sizes h — 0.1 (dotted) and h = 0.05 (solid line) 


X.2 Transformations in the Perturbation Theory 
for Integrable Systems 

Probleme general de la Dynamique. Nous sommes done conduit a nous 
proposer le probleme suivant: Etudier les equations canoniques 
dxi dF dy\ dF 

dt dry ’ dt dxi ’ 

en supposant que la fonction F peut se developper suivant les puissances 
d’un parametre tres petit /i de la maniere suivante: 

F = Fo + yF\ + /x 2 F 2 + ..., 

en supposant de plus que Fo ne depend que des x et est independant 
des y; et que Fi, F 2 ,... sont des fonctions periodiques de periode 2tt par 
rapport aux y. (H. Poincare 1892, p. 32f.) 

Consider a small perturbation of a completely integrable Hamiltonian. In action- 
angle variables (a, 9) on D x T d ( D an open subset of M d ), this takes the form 

H(a,0) = H o (a)+eH 1 (a,0) , (2.1) 

where e is a small parameter. We assume that H 0 and Hi are real-analytic, and that 
the perturbation Hi (which may depend also on e) is bounded by a constant on a 
complex neighbourhood of D x T d that is independent of 5 . No other restriction 
shall be imposed on the perturbation. 

For the unperturbed system (e = 0) we have seen that the motion is conditionally 
periodic on invariant tori {a = const., 0 G T d }. Perturbation theory aims at an 
understanding of the flow of the perturbed system. The basic tools are symplectic 
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coordinate transformations which take the system to a form that allows the long¬ 
time behaviour (perpetually, or over time scales large compared to e~ x ) of solutions 
of the system (certain solutions, or all solutions with initial values in some ball) to be 
read off. There are different transformations that provide answers to these problems. 
The emphasis in this section will be on the construction of suitable transformations, 
not on the technical but equally important aspects of obtaining estimates for them. 

The methods in Poincare’s Methodes Nouvelles form the now classical part of 
perturbation theory, but the theories of Birkhoff, Siegel, Kolmogorov/Amold/Moser 
(KAM) and Nekhoroshev in the 20th century have become “classics” in their own 
right. 


X.2.1 The Basic Scheme of Classical Perturbation Theory 


In the spirit of the preceding section, one might search for a symplectic change of 
coordinates (a, 6) i—> (6, p) close to the identity such that the perturbed Hamiltonian 
written in the new variables (6, p) depends only on b, or more modestly, depends 
only on b up to a remainder term of order 0(e N ) with a large N > 1, or to begin 
even more modestly, with N = 2. We search for a generating function 


S(M) = b-Q + sSi(b,Q) 

where • symbolizes the Euclidean product of vectors in R d and Si is 27r-periodic 
in 6 . Naively, we require that the symplectic transformation defined by 

dS (h dS f . 

a= ae M ’ 

be such that the order-e term in the expansion of the Hamiltonian in the new vari¬ 
ables, K(b , p) = H(a, 6), K(b , p) = Ho(b) + eKi(b, p) + ... depends only on b. 
Since 


H(a,9) = H(b+e^(b,6),e) = H 0 (b)+e {u(b) • ^(6,(9) + }+• •• 

with the vector of frequencies 


/> \ dH 0 . . 

“ (4> = -w (b) ■ 

the function Si must satisfy the partial differential equation 

co(b)-^-(b,0) + H 1 (b,0) = H 1 (b) 


( 2 . 2 ) 


for a function Hi that does not depend on 6 . Since Si is required to be 27r-periodic 
in 0, the function Hi must equal the average of Hi over the angles: 


H 1 (b) 



Hi(b, 6) d6 . 
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Equation (2.2) is the basic equation of Hamiltonian perturbation theory. From the 
Fourier series of Si and Hi, 

Si(b,0)= X *k(b)e ik - e , H,(b,0) = X 

ke^ d ke^ d 

we obtain a formal solution of (2.2) by comparing Fourier coefficients: so(b) is 
arbitrary and 

s k {b) = - M 0. (2.3) 

ik • u(b) 

At this point, however, we are struck by the problem of small denominators. For any 
values of the frequencies ujj (6), the denominator k-u(b) = Aqccq (6) +... + kd^d{b) 
becomes arbitrarily small for some k = (Aq,..., G Z d , and even vanishes if the 
frequencies are rationally dependent. 

For a perturbation where only finitely many Fourier coefficients hk are non¬ 
zero, the construction above excludes only a finite number of resonant frequencies 
(i.e., those with k • uj(b) = 0 for a & G Z d with hk 0) and small neighbour¬ 
hoods around them. For u(b) outside these neighbourhoods and for p on a complex 
neighbourhood of T d , we obtain for the Hamiltonian in the new variables 

K{b, <p) = H 0 {b) + sH^b) + 0(e 2 ) . 

In the general case, we can approximate the perturbation Hi up to 0{e 2 ) by a 
trigonometric polynomial. For analytic Hi, the Fourier coefficients hk decay expo¬ 
nentially with \k\ = J2 i \ki\, and hence the required degree m of the approximating 
trigonometric polynomial grows logarithmically with 5, i.e., m ~ \ log£|. 

As 5 —» 0, the remainder term is under control only for those frequencies 
u = u(b) for which the exponentially decaying Fourier coefficients hk of the pertur¬ 
bation decay faster than the denominators ik • u with growing \k\. This is certainly 
the case for frequencies satisfying Siegel's diophantine condition (or strong non¬ 
resonance condition , as it is sometimes called) 

\k-w\ >7|fc|- l/ , keZ d ,k^0 (2.4) 

for some positive constants 7, v. (Here again, \k\ = J2, \ki \)• If v > d— 1, the set of 
frequencies in a fixed ball that do not satisfy (2.4) has Febesgue measure bounded 
by Const • 7 (Exercise 5). Therefore, almost all frequencies satisfy (2.4) for some 
7 > 0. However, for any 7 and v, the complementary set is open and dense in ~R d . 

X.2.2 Lindstedt-Poincare Series 

... pour que la methode de M. Lindstedt soit applicable, soit sous sa 
forme primitive, soit sous celle que je lui ai ensuite donnee, il faut qu’en 
premiere approximation les moyens mouvements ne soient lies par au- 
cune relation lineaire a coefficients entiers; ... 

II semble done permis de conclure que les series (...) ne convergent pas. 
Toutefois le raisonnement qui precede ne suffit pas pour etablir ce point 
avec une rigueur complete. (H. Poincare 1893, pp. vi, 103.) 
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Fig. 2.1. Henri Poincare (left), born: 29 April 1854 in Nancy (France), died: 17 July 1912 
in Paris; Anders Lindstedt (right), born: 27 June 1854 in Sundborn (Sweden), died: 1939. 
Reproduced with permission of Bibl. Math. Univ. Geneve 


The above construction is extended without any additional difficulty to arbitrary 
finite order in e. The generating function is now sought in the form 

S(b, 0)=b-0 + eStib, 0) + e 2 S 2 (b, 0) + ... + s N ~ 1 S N - 1 (b, 0) (2.5) 

and, as before, the requirement that the first N terms in the ^-expansion of the 
Hamiltonian in the new variables be independent of the angles, leads via a Taylor 
expansion of the Hamiltonian to equations of the form (2.2) for Si,..., Sn-i- 

f)Q. 

Lo(b)-^-+K j (b,0) = K j (b) (2.6) 


where K\ = H\, 


id 2 H 0 fdS\ 8SA dHi dS x 
2 dA \d0'~d0 ) + ~dA'~d0 


and in general, Kj is a sum of terms 

1 &H ko (dS kl 8S kx \ 

i\ da* V 90 d0 ) 


with ko + ki + ... + ki = j . 


The function Kj denotes again the angular average of Kj. These equations can be 
formally solved in the case of rationally independent frequencies. The Hamiltonian 
in the new variables is then 
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K(b, <p) = H 0 (b)+£K 1 {b)+£ 2 K 2 {b)+.. .+£ N ~ 1 K N _ 1 (b)+£ N R N (b, 9). (2.7) 

The possible convergence of the series for TV —► oo is a delicate issue that was 
not resolved conclusively by Poincare (1893) in his chapter on “Divergence des 
series de M. Lindstedt”. If for some 6 *, the series (2.5) together with its partial 
derivatives converged as AT —► oo, then {b = b*,p G T d } would be an invariant 
torus of the perturbed Hamiltonian system. However, it was not until Kolmogorov 
(1954) that the existence of invariant tori - for diophantine frequencies - was found, 
using a different construction. A direct proof of the convergence of the series of 
classical perturbation theory for diophantine frequencies was obtained only in 1988 
by Eliasson (published in 1996); also see Giorgilli & Locatelli (1997) and references 
therein. 

Nevertheless, already the truncated series (2.5) leads in a rather simple way to 
strong conclusions about the flow over long time scales when it is combined with the 
idea of approximating the Hamiltonian by a trigonometric polynomial: the “ultra¬ 
violet cut-off”, an idea briefly addressed by Poincare (1893), p. 98f., and taken to 
its full bearing by Arnold (1963) in his proof of the KAM theorem. We formulate a 
lemma for a fixed truncation index N. Here, cj £? at( 6 ) denotes the derivative of the 
truncated series (2.7) with respect to b. 

Lemma 2.1. Suppose that uj(b*) satisfies the diophantine condition (2.4). For any 
fixed N > 2, there are positive constants Sq,c,C such that the following holds for 
e < £o-’ there exists a real-analytic symplectic change of coordinates (a, 6) ^ ( 6 , p) 
such that every solution (b(t), p(t)) of the perturbed system in the new coordinates, 
starting with ||b(0) — b*\\ <c| log er| —1 , satisfies 

\\b(t) — 6(0)|| < Cte N for t<e~ N+1 , 

Mt)-u e , N (b(0))t-pm\<C(t 2 + t\\og£\^ +1 )£ N for t 2 <£~ N+1 . 

Moreover, the transformation is 0(e)-close to the identity: ||(a, 0) — ( 6 , p)\\ < Ce 
holds for (a, 0) and ( 6 , p) related by the above coordinate transform, for ||b — b* || < 
c| log £| -zy-1 and for p in an e-independent complex neighbourhood of T d . 

The constants So,c,C depend on N , d , 7 , v and on bounds of Hq and Hi on a 
complex neighbourhood of {b*} x T d . 

Proof. Using the relations (2.3) and their analogues for (2.6), it is a straightforward 
but somewhat tedious exercise to show that at the given particular If , the functions 
Kj(b*, •), Sj(b*, •) are all analytic on the same complex neighbourhood of T d , and 
that the remainder term is bounded by 

\Rn(P\ 0)\ <C = C(N,d,^u) 

for all 6 in a complex neighbourhood of T d which is independent of 6 . Here, C 
depends in addition on the bound of Hi on a complex neighbourhood of { 6 *} x T d , 
or what amounts to the same by Cauchy’s estimates, on bounds of the exponential 
decay of the Fourier coefficients hk of Hi. (In case of doubt, see also Sect. X.4 for 
explicit estimates.) 
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Assume first that Hi (6, 6) is a trigonometric polynomial in 9 of degree m. Then 
Kj, Sj are trigonometric polynomials of degree jm. Since \k-u(b)\ > \k-u(b*)\ — 
|fc|(max \\uj'\\)\\b — b* ||, there is a 5 > 0 such that 

\k • u(b)\ > ^7 \k\~ u for \\b — b*\\<5, \k\ < Nm. 

This number 5 is proportional to 7(7Vra) -zy-1 . Consequently, since the construction 
involves only the trigonometric polynomials Kj , Sj of degree up to Nm, the above 
estimate for the remainder term Rjy holds also for \\b — b*\\<6. To approximate 
a general analytic Hi by trigonometric polynomials up to 0{e N ), we must choose 
the degree m proportional to | log 5^1. With the choice 5 = c(N 2 \ log5|) _zy_1 , 
for a sufficiently small c > 0 independent of 6 (and N), the above bound for the 
remainder iiW(6, 0) is then valid for b in the complex ball \\b — 6*|| < 25 and for 
p in a complex neighbourhood of T d (which depends only on N). By Cauchy’s 
estimates, this implies 


dRjsf 

~d0~ 


(M) 


<C, 


8Rn 

db 


(M) 


for ||b — 6*|| <5 and 6 G T d . Hence, as long as || b(t) — 
differential equations are of the form 


C 

< — 

~ 5 

fell <6, 


the Hamiltonian 




• dK 

<P= -Q^=Ue,N{b) 


0(e N /6) . 


This implies the result. □ 

Hence, the tori {b = 6(0), <E T d } are nearly invariant over a time scale 
£ -n+ i, an( j flow is close to a quasiperiodic flow over times bounded by the 

square root of e~ N + 1 . Lemma 2.1 is just a preliminary to more substantial results 
(which hold under appropriate additional conditions): invariant tori carrying a quasi¬ 
periodic flow with diophantine frequencies persist under small Hamiltonian pertur¬ 
bations (Kolmogorov 1954); every solution of the perturbed system remains close, 
within a positive power of 5, to some torus over times that are exponentially long in 
a negative power of £ (Nekhoroshev 1977); solutions starting close to an invariant 
torus with diophantine frequencies stay within twice the initial distance over time 
intervals that are exponentially long in a negative power of the distance (Perry & 
Wiggins 1994) or even exponentially long in the exponential of the inverse of the 
distance (Morbidelli & Giorgilli 1995). 

The symplectic transformations of this subsection were constructed using the 
mixed-variable generating function 5(6, 0). As was pointed out for example by 
Benettin, Galgani & Giorgilli (1985), rigorous estimates for the remainder terms 
are often obtained in a simpler way using the Lie method , which involves construct¬ 
ing the near-identity symplectic transformation as the time-5 flow of some auxiliary 
Hamiltonian system with a suitably defined Hamiltonian x(6, p). As before, the 
condition that the Hamiltonian H(a,6) = K(b,p) should depend on ip only in 
higher-order terms, leads to equations of the form (2.2), now for x instead of Si. 
We will use such a construction in the following subsection. 
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X.2.3 Kolmogorov’s Iteration 

It is easy to grasp the meaning of Theorem 1 for mechanics. It indicates 
that an s-parametric family of conditionally periodic motions [...] cannot, 
under conditions (3) and (4) [here: (2.4) and (2.9)], disappear as a result 
of a small change in the Hamilton function H. 

In this note we confine ourselves to the construction of the transforma¬ 
tion. (A.N. Kolmogorov 1954) 

For the completely integrable Hamiltonian Ho(a), the phase space is foliated into 
invariant tori parametrized by a. We now fix one such torus {a = a*, 6 <ET d } with 
strongly diophantine frequencies uj = cj(a*). Without loss of generality, we may as¬ 
sume a* = 0. This particular torus is invariant under the flow of every Hamiltonian 
H(a,6) for which the linear terms in the Taylor expansion with respect to a at 0 are 
independent of 0: 

H(a,6) = c + uj • a + \a r M(a,6)a (2.8) 

with c G M, uj G and a real symmetric d x <i-matrix M(a, 0) analytic in its 
arguments. Since the Hamiltonian equations are of the form 

a = 0(\\a\\ 2 ), 0 = a/ + 0(||a||), 

the torus {a = 0, 6 G T d } is invariant and the flow on it is quasi-periodic with 
frequencies uj. 

Consider now an analytic perturbation of such a Hamiltonian: H (a, Q)+sG(a , 0) 
with a small 5. Kolmogorov (1954) found a near-identity symplectic transforma¬ 
tion (a, 0) i—> (a, 0), constructed by an iterative procedure, such that the perturbed 
Hamiltonian in the new variables is again of the form (2.8) with the same uj, and 
hence has the invariant torus {a = 0, 6 <E T d } carrying a quasi-periodic flow with 
the frequencies of the unperturbed system. This holds under the conditions that uo 
satisfies the diophantine condition (2.4), and that the angular average 

M 0 := \ [ M(0,6)d6 is an invertible matrix. (2.9) 

(2 7T) a Jjd 

Here we describe the iterative construction of this symplectic transformation. The 
proof of convergence of the iteration will be given in Sect. X.5. 

We construct a symplectic transformation (a, 6) i—> (b, ip) as the time-e flow of 
an auxiliary Hamiltonian of the form 


d 

X(b, v) = £-‘f + XoM + ^2 biXii'P) , (2.10) 

i =1 

where £ E is a constant vector, and xo, Xi> • • • > Xd are 27r-periodic functions. 
(Quadratic and higher-order terms in b play no role in the construction and are there¬ 
fore omitted right at the outset.) The old and new coordinates are then related by 
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We insert this into 

H(a , 6) + eG(a, 0)=c + uj-b + \b T M(b, p)b 

+ £ V 7 ) + b T M(b , p)-^(b, p) + G(b, </>)| + 0(e||6|| 2 ) + 0(e 2 ). 

We now require that the term in curly brackets be Const + (9(||b|| 2 ). Writing down 
the Taylor expansion 


d 

G(b, (p) = Go(p) + biGi(p) + b T Q(b , p)b 

i— 1 

and inserting the above ansatz for x, this condition becomes 

w + ( w ' + u i(v) + 

d 

+ G 0 (p) + y ^bjGijip) = Const., 

i =1 

where u = (u\, ...,«d) T and u i (iq,..., Vd) T are defined by 


u{p) = M(0,p)(, , 
v(<p) = M{ 0 ,<p) ^-(p ). 


The condition is fulfilled if 


u,-^(<p)+Go(<p) = G 0 

U) • ~pr^(p) + u i{ i p) + Viiv) + Gi(p) = Ui + Vi + Gi 

op 


Ui + Vi + Gi — 0 




( 2 . 11 ) 


( 2 . 12 ) 

(2.13) 


(2.14) 

(2.15) 

(2.16) 


Here the bars again denote angular averages. Note that equations (2.14), (2.15) are 
of the form (2.2). Equation (2.14) determines xo and hence v = (iq,..., v,j) r by 
(2.13). Equations (2.16) then give u = («i,..., Ud) T ■ By (2.12), we need 


u = MoC , 

which determines £ uniquely because M 0 is assumed to be invertible. Equation 
(2.12) then yields u = («i,..., u,i) T . Finally, (2.15) determines \ j...., Xd, and 
the construction of \(b, p) is complete. In the new variables ( b, p), the perturbed 
Hamiltonian then takes the form 

H{a , 6) + eG(a , 9) = c + w ■ b + \b T M{b, p)b + e 2 G(b, p) 


(2.17) 
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with unchanged frequencies u and with M(b, p) = M(b , p) + 0(e). The pertur¬ 
bation to the form (2.8) is thus reduced from 0(e) to 0(e 2 ). The iteration of this 
procedure turns out to be convergent, see Sect. X.5. This finally yields a symplectic 
change of coordinates that transforms the perturbed Hamiltonian to the form (2.8). 
The perturbed system thus has an invariant torus carrying a quasi-periodic flow with 
frequencies u — a KAM torus, as it is named after Kolmogorov, Arnold and Moser. 


X.2.4 Birkhoff Normalization Near an Invariant Torus 

KAM tori are very sticky. 
(A.D. Perry & S. Wiggins 1994) 

In this subsection we describe a transformation studied by Poschel (1993) and Perry 
& Wiggins (1994) for systems with Hamiltonian in the Kolmogorov form (2.8) in a 
neighbourhood of the invariant torus {a = 0, 0 E T d }. This transformation is an 
analogue of a transformation of Birkhoff (1927) for Hamiltonian systems near an 
elliptic stationary point. 

The symplectic change of coordinates (a, 0) i—► (b,p) considered here trans¬ 
forms a Hamiltonian (2.8) with diophantine frequencies uo to the form H(a, 0) = 
K N (b) + 0(\\b\\ N ) for arbitrary N, or more precisely, the Hamiltonian in the new 
variables, Hjsr(b, (p) = H(a, 6 ), is of the form 

77jv(fr, (p) = lo • b + Z N (b) + y>) (2.18) 

with Zjsf(b) = 0(||b|| 2 ) and Rjy(b,p) = 0(\\b\\ N ). (We have taken the irrelevant 
constant term in (2.8) c = 0.) The equations of motion then take the form 

b=o(\\b\n, <p=w+om\)- 

Therefore, in these variables {b = 0, p E T d } is an invariant torus, and for suffi¬ 
ciently small r, 

||6(0)|| < r implies \\b(t)\\ < 2 r for t < Cn r~ N+1 . 

A judicious choice of N even yields time intervals that are exponentially long in a 
negative power of r on which solutions starting at a distance r stay within twice the 
initial distance (Perry & Wiggins 1994). Motion away from the torus can thus be 
only very slow. 

The normal form (2.18) is constructed iteratively. Each iteration step is very 
similar to the procedure in Sect. X.2.1, where now the distance to the torus plays the 
role of the small parameter. Consider a Hamiltonian 

77(( 2 , $) — uj • a T- Z(cl') R(a , 6 ) 

where Z(a) = (9(||a|| 2 ) and R(a,6) = 0(||a|| fe ) for some fc > 2 in a com¬ 
plex neighbourhood of {0} x T d . We construct a symplectic change of coordinates 
(a, 0) i—> (6, (p) via a generating function b • 0 + 5(6, 6) as 
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a = b+ w (M) ’ v = o + ^(b,e). 

We expand (omitting the arguments (6, 6) in dS/dO and dH/da) 

H ( b+ W e ) = H ^ + d I + ^ 

= id • b + Z(b) + 0) + • -qq | + Q(&, , 

where \Q(b,6)\ < Const. \\dS/dO\\ 2 . Since dH/db = cj + 0(||6||), we can make 
the expression in curly brackets independent of 6 up to 0(\\b\\ k+1 ) by determining 
S from the equation of the form (2.2): 

dQ _ 

u-—(b,0) + R(b,0) = R(b). 

For diophantine frequencies u, we obtain S(b, 9) = 0(\\b\\ k ) on a (reduced) com¬ 
plex neighbourhood of {0} x T d from the corresponding estimate for i?(6, 0). It fol¬ 
lows that the above symplectic transformation with generating function b-6+S(b,6) 
is well-defined for small ||6||, and the Hamiltonian in the new variables, H(b, ip) = 
if (a, 6 ), becomes 

H(b,<p) =l d-b + Z(b) + R(b,p) 
with Z(b) = Z(b) + R(b) and 

R(b, ip) = (^(6, 0)-w). ||(6, 9) + Q(b, 9) = 0(\\b\\ k+1 ), 

so that the order in b of the remainder term is augmented by 1. The procedure can 
be iterated, but unlike the iteration of the preceding subsection, this iteration is in 
general divergent. Nevertheless, a suitable finite termination yields remainder terms 
that are exponentially small in a positive power of r for ||6|] < r, by arguments 
similar to those of Sect. X.4. 


X.3 Linear Error Growth and Near-Preservation 
of First Integrals 

In the remaining part of this chapter we study the long-time behaviour of symplectic 
discretizations of integrable and near-integrable Hamiltonian systems. While here 
we will be concerned with general symplectic methods, it should be noted that some 
integrable problems admit integrable discretizations; see Suris (2003). 

In this section we are concerned with the error growth of symplectic numerical 
methods and their approximate preservation of first integrals. A preliminary analysis 
of linear error growth for the Kepler problem was first given by Calvo & Sanz-Serna 
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(1993). Using backward error analysis and KAM theory, Calvo & Hairer (1995a) 
then showed linear error growth of symplectic methods applied to integrable sys¬ 
tems when the frequencies at the initial value satisfy a diophantine condition (2.4). 
Here we give such a result under milder conditions on the initial values, combining 
backward error analysis and Lemma 2.1. We derive also a first result on the long¬ 
time near-preservation of all first integrals, which will be extended to exponentially 
long times in Sections X.4.3 and X.5.2 (under stronger assumptions on the starting 
values), and perpetually in Sect. X.6 (only for a Cantor set of step sizes). 

Figure 3.1 illustrates the linear error growth of the symplectic Stormer-Verlet 
method, as opposed to the quadratic error growth for the classical fourth-order 
Runge-Kutta method, on the example of the Toda lattice. The same number of func¬ 
tion evaluations was used for both methods. 



Fig. 3.1. Euclidean norm of the global error for the Stormer-Verlet scheme (step size h — 
0.02) and the classical Runge-Kutta method of order 4 (step size h — 0.08) applied to the 
Toda lattice with n — 3 and initial values as in Fig. 1.3 


We consider a completely integrable Hamiltonian system (usually not given in 
action-angle variables) 


OH, . 


■ dH ! \ 
9 = 


(3.1) 


and apply to it a symplectic numerical method with step size h, yielding a numerical 
solution sequence (p n , q n ). We assume that the Hamiltonian is real-analytic and that 
the conditions of the Arnold-Liouville theorem, Theorem 1.6, are fulfilled. Consider 
the symplectic transformation (p, q ) = ip (a, 6) to action-angle variables. We denote 
the inverse transformation as 


(a, 6 ) = ( I(p, q), 0(p, q )) . (3.2) 

We recall that the components I \,..., Id of I = ( Ii ) are first integrals of the system: 
/(p(£), q(t)) = I(p 0 ,qo) for all t. In the action-angle variables, the Hamiltonian is 
H(a) = H (p, q ), and we denote the frequencies 

dH 

"<“) = & ( “> ■ 

We consider this in a neighbourhood of some a* e R rf . 


(3.3) 
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Theorem 3.1. Consider applying a symplectic numerical integrator of order p to 
the completely integrable Hamiltonian system (3.1). Suppose that uj(a*) satisfies 
the diophantine condition (2.4). Then, there exist positive constants C, c and ho 
such that the following holds for all step sizes h < ho: every numerical solution 
starting with \\I(po, qo) — a*|| < c\ log 6,| _zy_1 satisfies 


\\(Pn,Qn) - (p(t),q(t))\\ < Cth p 

\\i(Pn,q n ) - i(po,qo)W < Ch p 


for t = nh < h p . 


The constants ho,c,C depend on d , 7, v, on bounds of the real-analytic Hamiltonian 
H on a complex neighbourhood of the torus {(p,q ); I(p,q) = a*}, and on the 
numerical method. 


Proof, (a) In the action-angle variables (a, 6 ), the exact flow is given as 

a(t) = a( 0) , 0(t) = u(a(0)) t + 0(0) . (3.4) 

By Theorem IX.3.1 (and Theorem IX. 1.2), the truncated modified equation of the 
numerical method is Hamiltonian with 1 


Hip, q ) = Hip, q) + h p H p+1 (p, q) + ... + h r H r+1 (p, q) . 

We choose r = 2p , and we denote by (p(t), q(t)) the solution of the modified equa¬ 
tions with initial values (po, qo)- In the variables (a, 6 ), the modified Hamiltonian 
becomes H(p , q) = 7Y(a, 6 ) with 

H(a,0) =H(a) + eG h (a,0) , (3.5) 


where e = h p and the perturbation function Qh is bounded independently of h on 
a complex neighbourhood of {a*} x T d . By Lemma 2.1 with e = h p and N > 3, 
there is a symplectic change of coordinates 0 (h p )~close to the identity, such that 
the solution of the modified equation in the new variables (6, Lp) is of the form 


b(t) = 6(0) + 0 (th pN ) , 

p(t) = ^(6(0)) t + <£>(0) + 0(f/i pAr_1 + t 2 h pN ) 


for t < h p , (3.6) 


with uJhifi) = u(b) + 0(h p ). The constants symbolized by the O-notation are in¬ 
dependent of h,oft< h~ p and of (6(0), ^(0)) with |6(0) — a*\ < c\ log/i| _zy_1 . 
Since the transformation between the variables (a, 6 ) and (6, p) is 0(h p ) close to 
the identity, it follows that the flow of the modified equations in the variables (a, 6) 
satisfies 

1 We always assume, without further mention, that the modified Hamiltonian is well-defined 
on the same open set D as the original Hamiltonian. This is true for arbitrary symplectic 
methods if D is simply connected; on general domains it is satisfied for (partitioned) 
Runge-Kutta methods and for splitting methods; see Sections IX.3 and IX.4. 



416 X. Hamiltonian Perturbation Theory and Symplectic Integrators 


a(t) = 5(0) + 0(h p ) , 

0(t) = cj(u(0)) t -\- 0(0) -f- tcji Q(h p ') 


where = ^(6(0)) — u(a(0)) = 0(h p ) yields the dominant contribution to the 
error. By comparison with (3.4) and since a(t) = I(p(t), q(t )), the difference be¬ 
tween the exact solution and the solution of the modified equation therefore satisfies 

q(t)) - ( p(t ), q(t)) = 0(th p ) ^ 1 < t < h~ p 
I(p(t),q(t)) - I(p 0 ,q 0 ) = 0(h p ) 

The same bounds for t < 1 follow by standard error estimates. 

(b) It remains to bound the difference between the solution of the modified 
equation and the numerical solution. By construction of the modified equation with 
r = 2p and by comparison with (3.6), one step of the method is of the form 

bn+1 =bn + 0(h r+1 ) , Vn+1 = co h (b n ) h + <p n + <D(h r+1 ). 

It follows that for t = nh , 

b n = b(t ) + 0(th r ), = !p(t) + 0(t 2 h r ) . 


For t < h~ p and r = 2 p, we have th r < h p . Hence the difference between the nu¬ 
merical solution and the solution of the modified equations in the original variables 
(p, q) is bounded by 

(p«, Qn) - q{t )) = 0(th p ) t = nh< h~ p 

I(Pn,q n ) ~ I(p(t),q(t)) = 0(h p ) 


Together with the bound of part (a) this gives the result. □ 

Remark 3.2. The linear error growth holds also when the symplectic method is 
applied to a perturbed integrable system with a perturbation parameter £ bounded 
by a positive power of the step size: s < K h a for some a > 0. The proof of this 
generalization is the same as above, except that possibly a larger N is required in 
using Lemma 2.1. 

Example 3.3 (Linear Error Growth for the Kepler Problem). From Exam¬ 
ple 1.10 we know that for the Kepler problem the frequencies (1.16) do not sat¬ 
isfy the diophantine condition (2.4). Nevertheless we observed a linear error growth 
for symplectic methods in the experiments of Fig. 1.2.3 (see also Table 1.2.1). This 
can be explained as follows: in action-angle variables the Hamiltonian of the Ke¬ 
pler problem is 7 Y(ai,a 2 ), where a 2 = L is the angular momentum. Since the 
angular momentum is a quadratic invariant that is exactly conserved by symplec¬ 
tic integrators such as symplectic partitioned Runge-Kutta methods, the modified 
Hamiltonian 

H(a,0) = H(ai,d 2 ) + £ <3h(cL 1 ,a 2 ,0 1 ) 

does not depend on the angle variable 0 2 (see Corollary IX.5.3). As in the proof of 
Lemma 2.1 we average out the angle 0\ up to a certain power of 5 . Since we are 
concerned here with one degree of freedom, the diophantine condition is trivially 
satisfied, and we can conclude as in Theorem 3.1. 
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X.4 Near-Invariant Tori on Exponentially Long Times 

We refine the results for the classical perturbation series of Sect. X.2.2 to yield lo¬ 
cally integrable behaviour, up to exponentially small deviations, over time intervals 
that are exponentially long in a power of the small perturbation parameter. We then 
combine this result with backward error analysis to show the near-preservation of 
invariant tori over exponentially long times in a negative power of the step size for 
symplectic integrators. We begin with the necessary technical estimates. 


X.4.1 Estimates of Perturbation Series 


We will estimate the coefficients of the perturbation series (2.5), which requires a 
bound for the solution of (2.6). We use the following notation: for p > 0 and with 
|| • || the maximum norm on 

Up = {6 £ T d + iM. d ; ||Im6»|| < p} 

denotes the complex extension of the d-dimensional torus T d of width p. For a 
bounded analytic function F on U p , we write 


ll*1U = sup |F(0)| , 

eeu p 


dF 
~de P 


d 


X 


dF 

dOj P ’ 


Following Arnold (1963), we prove the following bounds for the solution of the 
basic partial differential equation (2.2). 


Lemma 4.1. Suppose uj E satisfies the diophantine condition (2.4). Let G he a 
hounded real-analytic function on U p , and let G denote the average of G over T d . 
Then, the equation 


U) '% +G ~ G 


has a unique real-analytic solution F on U p with zero average F = 0. For every 
positive S < min(p, 1), F is hounded on U p -$ hy 


||F|| p - 5 < Ko <r“ +1 ||G|| p , 


dF 

~d6 


<«i <5- a ||G|| P , 

p-5 


where a = v + d + 1 and kq = 7 1 8 d 2 v v\, ^ 1=7 1 8 d 2 u+1 (u + 1 )!. 


Riissmann (1975, 1976) has shown that the estimates hold with the optimal ex¬ 
ponent a = v + 1 and with = 2 d+1 ~ v (2u)\ and = 2 d ~ v (f2v + 2)!. This 
optimal value of a would yield slightly more favourable estimates in the following, 
but here we content ourselves with the simpler result given above. 
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Proof of Lemma 4.1. We have the Fourier series, convergent on the complex exten¬ 
sion ||Im0|| < p, 


G(6) -G = Y J 9k e ik e , F{0) = £ /* e ik ' e 

fc# 0 k 

with Fourier coefficients fo=F = 0 and 

fk ~~ 7j~— for k£Z d ,k^0. 

ik • uj 

By Cauchy’s estimates, \g k \ < Me - !^ with M = IIG — GIL < 2IIGIL and 
I * I = £ I ki |. It follows with (2.4) that 

Xi fk\e mp ~ 5) < — xi fc r e_|fe|5 ’ 

k ^ k 

Y i/fei • i*i e ' fe|(p “ 5) < — E i fc r +1 e_|fe " • 

k ^ k 

It remains to bound the right-hand sums. We use the inequality x v jv\ < e x with 
x = | /c | <5/2 to obtain 


\\F\\p-s < 


dF 


dO 


< 


p-S 


\kf e ~\ k \ 5 < 2 u S ~ u u \ e ~\ k \ s / 2 . 

k k 

The last sum is bounded by 

00 d / -I , — < 5/2 \ ^ 

^ e -l^ = ( I + 2S e-^) =([±^) <r')'. 

k j =1 ' ' 

Taken together, the above inequalities yield the stated bound for 11 F \ \ p s . The bound 
for the derivative is obtained in the same way, with v replaced by v + 1 . □ 



The coefficients of the perturbation series (2.5) are bounded as follows. 


Lemma 4.2. Let Ho, Hi be real-analytic and bounded by M on the complex r- 
neighbourhood B r (b *) of b* E and on B r (b *) x U p , respectively. Suppose 
that = (dHo/da)(b*) satisfies the diophantine condition (2.4). Then, the 

coefficients of the perturbation series (2.5) are bounded by 


8S j 

dO 


(*V) 


< GolGi/'L 1 

p/2 


for all j > 0. Here Co = 2 r, and C% = 128 (ki M/rp a ) 2 with a and Ki of 
Lemma 4.1. 
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Proof. We recall from Sect. X.2.2 that Sj is determined by (2.6), where K\ = Hi 
and for j > 2, 


= E E i 

i=2 ki=j 

j~ 1 

+E E 


1 


i! da* V 00 <90 

1 d‘H Y (dS kl dS ki 


i\ da i 


89 


i=1 /ci H-... -h^i =j — 1 

We fix an index, say J, set 5 = p/(2 J) and abbreviate 

\\Kkh = \\K k (b*,.)\\ p - j5 

and similarly for 8S k /89. By (2.6) and Lemma 4.1, we have 


89 


dS, 


89 




We use the Cauchy estimate 


1 di n° ( \ <r M I I II 

where | • | denotes the sum norm on C d , and bound || • ||j_i by || • ||k for k < j — 1. 
We thus obtain from the above formula for K* 


i—2 k\ + .. .-\-ki — j 
3~ 1 


M 

8 S kl 



ry*% 

89 

k i 

<96> 


M 


E \ -V iKZ 


Z=1 /Ci + ...+/Ci=j-l 

Combining the two bounds yields 





<9(9 


<96> 


dSi 


86 


— ftj 5 


where, with /i = ( M/r)(Ki/S a ), we have f3\ = fi and recursively for j > 2, 


j-i 


O'^E E Ar-'Ai+Z'E E Pk 1 -----Pk i - 

i=2 k! + ...+ki=j i= 1 + 

Multiplying this equation with and summing over j , we see that the generating 
function b(Q = Y^jLi PjC"* * s given implicitly by 


0(0 - K = /* 


i-0(C) 


-i-6(C) +K 


l - 6(0 


- 1 
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or explicitly, after solving the quadratic equation, by 


HO = 


i i fT( 1 f H 

21-fyU y 4 Vl-(-yU/ 1 + /i 


Hence, b(() is analytic on the disc \(\ < l/(4/x(l + //)), and is there bounded by 
1 /(2(1 + /i)). For /i > 1, Cauchy’s estimate yields 

< r Pj < 2r (8 m 2 ) j_1 • 

(For the uninteresting case fj, < 1 the bound is 2r • ~ 1 .) For j = J this almost gives 

the stated result upon inserting the definition of /x, but with an exponent 2a instead 
of a. This can be reduced to a if in the above proof S is chosen as <5i = p/4 in the 
first step and in the other steps as Sj = p/(4J). This leads to a more complicated 
quadratic equation where now b(() is analytic for \(\ < (C\ J a ) -1 . We omit the 
details of this refinement of the proof. □ 


For the remainder term in (2.7) we then obtain the following bound. 

Lemma 4.3. In the situation of Lemma 4.2, with r < 1 and for C\N a < l/{2e), 

\\R N (b*,-)\\ p/ 2<4Mr{^N a Y . 

Proof The remainder term Rn in (2.7) is a sum of terms 

for ko + k\ + ... + ki = N , 

i\ oa l 

where 

dSk dSk +1 N-k-i dS N-i 

^ k dO + dO + "' + dO ' 

As long as CiN a < l/(2e), we have, by Lemma 4.2, 

N—l 

\\ Qk ( b *,-)\\ P/ 2 < 

j—k 

N—l 

< C 0 < 2C 0 (C\N a ) k . 

j—k 


This implies 


1 d i H ] 


k 0 


i\ da 1 


(Qk lt ■ ■ ■ , Qki)(b*,') 


p/2 


M 

< — 2Co(Ci7V“) 


a\N 


for + k\ + ... + ki = N. (This bound is also valid when an argument different 
from If appears in the derivatives of H 0 and Hi , as is needed for the remainder 
terms in the Taylor expansion.) Estimating the number of such expressions by 


^r ;- 1 



= 2 2N 


yields the result. 


□ 
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X.4.2 Near-Invariant Tori of Perturbed Integrable Systems 

The following result extends Lemma 2.1 to exponentially long times for sufficiently 
small values of the perturbation parameter. 

Theorem 4.4. Let Hq, Hi be real-analytic on the complex r-neighbourhood B r {b*) 
ofb * E and on B r {b *) x U p , respectively, with r < 1 and p < 1 . Suppose that 
uj{b*) = (dHo/da)(b*) satisfies the diophantine condition (2.4). There are positive 
constants £q, Co, C such that the following holds for every positive (3 < 1 and for 
e < £o-' there exists a real-analytic symplectic change of coordinates (a, 0) i—> ( 6 , <p) 
such that every solution ( b(t ), p(t)) of the perturbed system in the new coordinates, 
starting with || 6 ( 0 ) — b*|| < co£ 2/3 , satisfies 

|| b(t) - 6 ( 0 )|| < Ctexp(-c£- 0 / a ) for t < exp(|ce~^ a ) . 

Here, a = v + d + 1 and c = (16Cie/r) -1 / a with C\ of Lemma 4.2. Moreover, 
the transformation is such that, for (a, 0) and (6, ip) related by the above coordinate 
transform, 


\\a-b\\<Ce for ||6 - 6 *|| < c 0 e 2/3 , ip e U p/2 ■ 

The thresholds £ (J and e (J are such that e^ 3 is inversely proportional to yC\, and cq 
is proportional to 7 C\. 

Remark 4.5. Theorem 4.4 is a local result, showing that for 6 0 near 6* the tori 
{b = 60 , G T d } are nearly invariant, up to exponentially small deviations, over 
exponentially long times. Nekhoroshev (1977, 1979) has shown the global result, 
under a “steepness condition” which is in particular satisfied for convex Hamiltoni¬ 
ans, that for sufficiently small 5 every solution of the perturbed Hamiltonian system 
satisfies, for some positive constants A, B < 1 (proportional to the inverse of the 
square of the dimension), 

|| a(t) — a(0)|| < e B for t < exp(^ _A ) . 

Remark 4.6. The constant C\ in Lemma 4.2 and constants in similar estimates of 
Hamiltonian perturbation theory are very large, with the consequence that the results 
on the long-time behaviour derived from them are meaningful, in a rigorous sense, 
only for extremely small values of the perturbation parameter 5 . Nevertheless, apart 
from their pure theoretical interest these results are of value as they describe the 
behaviour to be expected if one presupposes that the constants obtained from the 
worst-case estimations are unduly pessimistic for a given problem, as is typically 
the case. 

Proof of Theorem 4.4. The proof combines Lemmas 4.2 and 4.3 with the proof of 
Lemma 2.1. An appropriate choice of the truncation indices N and m then gives the 
exponential estimates. 
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As in the proof of Lemma 2.1, we approximate Hi ( 6 , 0) by a trigonometric poly¬ 
nomial of order m in 0. The error of this approximation is bounded by 0(e~ mp / 2 ) 
on B r (b *) x U p / 2 , which is 0(e~ N ) for the choice m = 2N/p made below. By 
the arguments of the proof of Lemma 2.1, the estimates of Lemmas 4.2 and 4.3 (for 
7 replaced by 7 / 2 , which increases C\ to 4Ci) are then valid in 0((jm)~ a ) and 
0((Nm)~ a ) neighbourhoods of 6 *: for a sufficiently small constant c* and with 
C 2 = 16 Ci/r, 


dS i 

~d0 


(M) 


< CoHG /') 7 - 1 


\R N (b, 9 )| < 4Mr (C 2 N a ) N 


for \\b-b*\\<c*(jm)- a , deU p/2 , 
for ||6 - 6 *|| < c*(Nm)~ a , 9 e U p/2 . 


We now consider the symplectic change of variables (a, 0) (6, (p) defined by the 

generating function S(b,0). The Hamiltonian equations in the variables ( 6 , (p) are 
then of the form, for ||6 — b* || < c*(Nm)~ a , 

b = -^(b, ^ = -e N ^L^ = 0 {s n {C 2 N«) n ) 

d<p 89 dp 

BK 

p=—(b,p) = LO £tN (b) + 0((Nm) a ■ s N (C 2 N a ) N ) . 


Choosing m = 2N / p and N such that C 2 N a = 

b = 0 (exp(— c£~P/ a )) 

¥ = ^e,Ar(b) + 0(s~ 2 ^ exp(— ce~^^ a )) 


l/{eeP) gives 

for ||6 — 6 * || < c 0 e 2f3 


(4.2) 


with c = (C 2 U) a , which yields the result. 


□ 


X.4.3 Near-Invariant Tori of Symplectic Integrators 

We return to the situation of Sect. X.3 and apply a symplectic numerical method to 
the integrable Hamiltonian system (3.1) with (3.2) and (3.3). 

Theorem 4.7. Consider applying a symplectic numerical integrator of order p 
to the real-analytic completely integrable Hamiltonian system (3.1). Suppose that 
cj(a*) satisfies the diophantine condition (2.4). Then, there exist positive constants 
Co, c, C and ho such that the following holds for all step sizes h < ho and for 
all p < min(p, a) with a = v + d + 1: every numerical solution starting with 
\\I(po, Qo) ~ II < cohsatisfies 

\\I(Pn,qn)-Hpo,qo)\\<Ch? for nh < exp(c h~^ /a ) . 

The constants ho,co,c,C depend on d , 7, v, on bounds of the real-analytic Hamil¬ 
tonian H on a complex neighbourhood of the torus {(p, q) ; /(p, q) = a*}, and on 
the numerical method. 
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Proof. The proof is obtained by following the arguments of the proof of Theo¬ 
rem 3.1. Instead of Lemma 2.1, now Theorem 4.4 is applied to the modified Hamil¬ 
tonian system (3.5) with s = h p . This gives a change of coordinates (a, 6) i—> (b,<p) 
0(h p )~close to the identity, such that in the new variables, the solution (b(t), <p(t)) 
of (3.5) satisfies 


b(t) =b 0 + C>(exp(—c/i _Al/a )) for t < exp (ch~^ a ) . 

On the other hand, using the exponentially small bound of Theorem IX.7.6, together 
with Theorem 4.4 and the arguments of part (b) of the proof of Theorem 3.1, yields 
for the numerical solution in the new variables 

b n = b(t) + 0(exp(—c/i _/x / a )) for t = nh < exp(ch~ p ^ a ) . 

Together with a n — b n = 0(h p ) this gives the result. □ 

Remark 4.8. When the symplectic method is applied to a perturbed integrable sys¬ 
tem as in Theorem 4.4, then the same argument yields for ||/(p 0 , Qo) ~ || < cqT] 2 ^ 

with r] = max(e, h p ) and (3 < 1 the bound 

\\I(Pn,q n ) - I(Po,qo)\\ < Crj for t < exp(cry _/3/a ) . 


X.5 Kolmogorov’s Theorem on Invariant Tori 

(The proof of this theorem was published in Dokl. Akad. Nauk SSSR 98 
(1954), 527-530 [MR 16, 924], but the convergence discussion does not 
seem convincing to the reviewer.) This very interesting theorem would 
imply that for an analytic canonical system which is close to an integrable 
one, all solutions but a set of small measure lie on invariant tori. 

(J. Moser 1959) 

It was a celebrated discovery by Kolmogorov (1954) that invariant tori carrying a 
conditionally periodic flow with diophantine frequencies persist under small pertur¬ 
bations of the Hamiltonian. Together with the extensions and refinements by Arnold 
(1963), Moser (1962) and later authors, Kolmogorov’s result forms what is now 
known as KAM theory. Here we give a proof of Kolmogorov’s theorem and use 
it in studying the long-time behaviour of symplectic numerical methods applied to 
perturbed integrable systems. 

X.5.1 Kolmogorov’s Theorem 

In Sect. X.2.3 we have already given Kolmogorov’s transformation which reduces 
the size of a perturbation to a Hamiltonian of the form (2.8) from 0{e) to 0(s 2 ) y at 
least formally. The iteration of that procedure is convergent and yields the following 
result. 
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Theorem 5.1 (Kolmogorov 1954). Consider a real-analytic Hamiltonian H (a, 0), 
defined for a in a neighbourhood of 0 E and 0 E T d , for which the linearization 
at a* =0 does not depend on the angles: 

H(a,0) = ccc • a^a T M(a,0)a . (5.1) 

Suppose that satisfies the diophantine condition (2.4), viz., 

|fe • a;| > 7 \k\~ u for keZ d ,k 7 ^ 0, (5.2) 

and that the angular average M 0 of M( 0, •) is an invertible d x d matrix: 

\\M 0 v\\ >mIMI for veR d , (5.3) 

with positive constants 7, z/, p. Let H £ (a,6) = H(a,0) + eG(a,6) be a real- 
analytic perturbation of H (a, 0). Then, there exists £0 > 0 such that for every e 
with \e\ < £q, there is an analytic symplectic transformation : (6, <p) 1 -^ (a, 0), 
0(e) close to the identity and depending analytically on £, which puts the perturbed 
Hamiltonian back to the form 

H s (a, 0) = c £ + u • b + \b T M s (b, p)b for (a, 0) = fi> £ (b, p). (5.4) 

The perturbed system therefore has the invariant torus {b = 0, p GT d } carrying a 
quasi-periodic flow with the same frequencies u: as the unperturbed system. 

(The threshold £q depends on d , z/, 7, p and on bounds of H and G on a complex 
neighbourhood of {0} x T d .) 

2 Andrei Nikolaevich Kolmogorov, born: 25 April 1903 in Tambov (Russia), died: 20 Octo¬ 
ber 1987 in Moscow. 

3 Vladimir Igorevich Arnold, born: 12 June 1937 in Odessa (USSR). 

4 Jiirgen K. Moser, born: 4 July 1928 in Konigsberg, now Kaliningrad, died: 17 December 
1999 in Zurich (Switzerland). 
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Of particular interest is the case when H(a,6) = Hq(cl) is independent of 0 , 
so that we are considering perturbations of an integrable system. In this case, the 
theorem shows that all invariant tori with frequencies u(a) = dHo / da(a ) satisfying 
(5.2) and with invertible Hessian d 2 H 0 /da 2 (a) persist under small perturbations 
and are only slightly deformed. 

Kolmogorov (1954) stated the theorem and formulated the iteration of Sec¬ 
tion X.2.3, but did not give the details of the convergence estimates. Arnold (1963) 
gave a first complete proof of the theorem for perturbed integrable systems, using a 
construction based on the “ultra-violet cutoff” (cf. Lemma 2.1) which yields a single 
transformation simultaneously for all frequencies satisfying the diophantine condi¬ 
tion (2.4), in contrast to Kolmogorov’s iteration which yields a different transforma¬ 
tion for every choice of diophantine frequencies. However, Arnold’s transformation 
is no longer analytic in the perturbation parameter 5 . Moser (1962) showed that the 
analyticity of the Hamiltonian can be replaced by differentiability of sufficiently 
high order. Full proofs of Kolmogorov’s theorem along his original construction 
were published by Thirring (1977) (for a reduced model problem) and by Benettin, 
Galgani, Giorgilli & Strelcyn (1984). 

As in Remark 4.6, a practical difficulty with Theorem 5.1 is that the theoreti¬ 
cally obtained threshold £o is very small. The proof below requires So < ^o a with 
a = is + d + 1 of Lemma 4.1, where So is inversely proportional to is. This pes¬ 
simistic estimate of the threshold can be somewhat improved by first reducing the 
perturbation of an integrable Hamiltonian system via a perturbation series expan¬ 
sion as in the proof of Theorem 4.4 and then applying Kolmogorov’s theorem to the 
remainder of the truncated perturbation series. 

The proof of Theorem 5.1 uses iteratively the following lemma, which refers 
to the transformation constructed in Sect. X.2.3. Similar to Sect. X.4 we use the 
notation 

IX-'I P = sup{|G(a,<9)| ; |M| < p , ||Im0|| < p} 

for a bounded analytic function G on W p := B p ( 0) x U p , where again B p { 0) is the 
complex ball of radius p around 0 and U p is the complex extension of T d of width p. 
The same notation is used for vector- and matrix-valued functions, in which case the 
underlying norm on C d or C dxd is the maximum norm or its induced matrix norm, 
respectively. 

Lemma 5.2. In the situation of Sect. X.2.3 and under the conditions of Theorem 5.1, 
suppose that H and G are real-analytic and bounded on W p . Then, there exists £o > 
0 such that the following bounds hold for Kolmogorov’s transformation whenever 
0<5 < S 0 : 

if \\eG\\ p < 6 5a , then ||e 2 G|| p _, < (\5f a 

and ||eVx||p -5 < 5 3a , \\M - M\\ p _ s < 6 2a , 

where a = v + d + 1. The threshold depends only on d , z/, 7, p and on ||iT|| p . 
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Proof. We estimate the terms arising in the construction of Kolmogorov’s transfor¬ 
mation of Sect. X.2.3. For brevity we denote || • \\j = || • \\ p -js /4 for j = 0,1, 2,3,4. 

(a) The transformation (6, p) (a, 6) is constructed such that (a, 6) = y(e), 
where y(t) is the solution of y = J _1 Vx(t/) with t/(0) = (b, p). Suppose for the 
moment that 

IkVxlls < \S . (5.5) 

Let (6, p) G Wp-s • Then, || y{t) — y{ 0) || < \5 for 0 < t < e, and in particular 
|| (a, 0) — ( 6 , p) || < \8. We define 

e 2 R(b,ip) := (a-b + e^(b,tp), 0 - <p - e^(b,<p)j 
= y{e) - 2 /( 0 ) - eJ _ 1 Vx(y(0)) 


and note 

||-R(&,^)ll < §max \\y(t)\\ < i||J _1 V 2 x J-'Vxlls 

so that 

||i?||4<i||V 2 x||3||Vx||3. (5.6) 

(b) Tracing the construction of Sect. X.2.3, we find by Taylor expansion of 
H (a, 6) that the new matrix is 

M(b,(p)±M(b,(p)+eL(b,<p) 


with 



i=i ' 


dM d X 
da t dipt 


d_M_ dy\ 

dOi dbi) 


(' b, <p) + P(b, <p) + Q(b, <p) 


where P(b, ip) is symmetric with 

b T P(b, f)b = b T (. M(b, ip) - M( 0 , ip)) d -f 
and where Q(b, p) is given by ( 2 . 11 ). It follows that 


II M - M || 4 < 2e(||VM(l4 ||VxlU + ||V 2 G|| 4 ) . (5.7) 

From the construction of G we also find by simple estimates of Taylor remainders 

IIGU < \\VH\\ 3 \\R\\ 4 + ||VG|| 3 IIVxIU + ||V 2 i7|| 3 IIVxlll ■ (5.8) 


(c) Using Lemma 4.1 in the equations (2.12)-(2.16) defining x of (2.10), we 
obtain first 


dxo 

dp 


< K 1 5 _a ||G 0 ||o 


llxolli < k 0 8 a+1 ||G 0 ||o , 


i 
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and by a second application of that lemma, for i = 1 ,..., d, 

\\Xih < Ko<V°* (||«||i + |M|i + IIGilli) 

where, by construction of u and v, 

IMIi<||M|| 1 ||^|| i , IMIi < ll-^lliM -1 

It then follows by Cauchy’s estimates that 

llVxlls < CS~ 2a ||G||o , ||V 2 x|| 3 < CS- 2 *- 1 ||G||o • (5.9) 

(d) Combining the estimates (5.6)-(5.9) and using once more Cauchy’s esti¬ 
mates to bound derivatives of H and G yields 

\\s 2 G\\ p . s < C8~ ia ~ 1 \\sG\\ 2 p 
||eVx||p-5 < C5~ 2a \\eG\\ p 
||M-M|| p _ 5 < G(5 _2a_3 ||eG||p . 

All this holds under the condition (5.5). By (5.9), this condition is satisfied if 
II^GUp < S 5a and 5 < 5 o with a sufficiently small So. (Tracing the above constants 
shows that So needs to be inversely proportional to , or inversely proportional 
to v.) This yields the stated bounds. □ 

Proof of Theorem 5.1. Kolmogorov’s iteration yields sequences 

G(°) =g,g (1) ,g (2) ,... 

M (0) = 

y(°) y(l) y( 2 ) 

By Lemma 5.2 they satisfy, provided that ||^G|| P = S 5a with S < 

||£ 23 G«IU) < (2 - j 5f a 

||M (j+1) - M^WpU) < (2 ~ j 8) 2a 
lk 2 'Vx a) ||pO) < (2 ~ j 8) 3a 

where p^ = p — (1 + | + ... + 2 ~i)5 > \p for all j. Note that (5.11) implies 
that the inverse of M^ is bounded by 2for all j, so that the iterative use of 
Lemma 5.2 is justified. The time-5 2 ^ flow of x ^ is a symplectic transformation 
\ which by (5.12) satisfies 

\\4 j) - id|| p/2 < (2~ j S) 3a . 


(5.10) 

(5.11) 

(5.12) 



(5.13) 
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The composed transformation 

:= o (j^ o ... o crp) 

is constructed such that 

#(V£’ -1) (&,¥>)) =c (j) +LO-b + b T M( j \b,<p)b + e 2i G {j \b,<p ). (5.14) 

By (5.13), the sequence ip^\b,<p) converges uniformly on W p / 2 x (—e 0 ,£o) to a 
limit ^ £ (b, p). By Weierstrass’ theorem, £ {b , p) is analytic in (6, p, e) (and in any 
further parameters on which M and G might possibly depend analytically). Since 
fi) £ depends analytically on e and V’o = id, it follows that fi> £ is O(e )-close to the 
identity on W p / 2 . By (5.10) and (5.14), the transformed Hamiltonian H o f> £ is of 
the desired form (5.4). □ 


X.5.2 KAM Tori under Symplectic Discretization 

Consider a Hamiltonian system 

dU, x dH, N 

p=~-g^(p,q), q=- 7 ^(p,q), (5.15) 

for which, in suitable coordinates (a, 6), the Hamiltonian H(p,q) = H(a,6) + 
eG(a,Q) satisfies the conditions of Theorem 5.1. Kolmogorov’s theorem yields a 
transformation to variables ( 6 , p) in terms of which 

H(p, q) = u • b + \b T M e {b, p)b , 

so that the torus T u = {b = 0, p E T d } is invariant and the flow on it is quasi- 
periodic with frequencies u. 

For a symplectic integrator of order p applied to (5.15), backward analysis gives 
a modified Hamiltonian 77(p, g) which is an 0(h p ) perturbation of H(p , g): 

H(p, g) = cj • b + dfr T M £ ( 6 , p)b + h p G(b, p) . (5.16) 

Kolmogorov’s theorem can be applied once more, yielding an invariant torus 7^ 
of the modified Hamiltonian Ti (p, q ) which again carries a quasi-periodic flow with 
frequencies a;. Combined with the exponentially small estimates of backward analy¬ 
sis for the difference between numerical solutions and the flow of the modified 
Hamiltonian system, this gives the following result of Hairer & Lubich (1997). 

Theorem 5.3. In the above situation , for a symplectic integrator of order p used 
with sufficiently small step size h, there is a modified Hamiltonian Ti with an in¬ 
variant torus carrying a quasi-periodic flow with frequencies to, 0{h p ) close 
to the invariant torus T u of the original Hamiltonian Ti , suchjhat the difference 
between any numerical solution (jp n , q n ) starting on the torus and the solution 
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( p(t),q(t )) of the modified Hamiltonian system with the same starting values re¬ 
mains exponentially small in 1/h over exponentially long times: 

II (Pn,q n ) - (p(t),q(t)) II < Ce~ K,h for t = nh < e K//l . 

The constants C and n are independent of n, h , 5 (for h , 6 sufficiently small) and of 
the initial value (po, Qo) £ 7^. 

Proof (a) For sufficiently small h, Kolmogorov’s theorem applied to (5.16) yields a 
change of coordinates (b, <p) i— » (c, 'ip), 0(h p ) close to the identity, which transforms 
the modified Hamiltonian to the form 

H(p,q) =u • c + ±c r M e>/l (c, ip)c , 

with the invariant torus T u = {c = 0, /> G T' , |. The corresponding differential 
equations read in these coordinates 

C = u(c, Ip) y ip = Lc-\-v(c,fi>) (5.17) 

where u(c,fi>) = 0(||c|| 2 ) and v(c,fi>) = 0(||c||), and similarly for the derivatives 
du/dc = 0(||c||), du/dfi) = 0(||c|| 2 ), and dv/dc = 0(1), dv/dip = 0(||c||). 
The constants in these 0-terms are independent of h and 5 . Let (c(t),fi>(t)) and 
(c(t),ip(t)) be two solutions of (5.17) such that ||c(t)|| < ft, ||c(f)|| < ft (ft suffi¬ 
ciently small) for all t under consideration. Then, an argument based on Gronwall’s 
lemma shows that their difference is bounded over a time interval D <t <1/ft by 

||c(t) - c(t)|| < C (||c(0) — c(0)|| + /3 HV’(O) 7^(0)||) 

< C (t ||c(0) — c(0)j| + HV’(O) - V’(O)H) , 

for some constant C that does not depend on ft, h or 5 . 

(b) In the following we denote y = (p, q) for brevity, and more specifically, 
y n denotes the numerical solution starting from any p 0 on the torus 7ft, i.e., the 
c-coordinate of yo vanishes: Co = 0. We denote by y(t , s, z) the solution of the 
modified Hamiltonian system with initial value y(s,s,z) = z, and more briefly 
y(t ) = y(t, 0, yo) the solution starting from y Q . By Theorem IX.7.6, the local error 
of backward error analysis at tj = jh is bounded by 

II Uj !.//,. j) I < <5 := Const. he~ 3K/h 

for some constant n, as long as y 3 remains in a compact subset of the domain of 
analyticity of Ti. We further denote the c-coordinates of y n , y(t) and y(t , tj,yj) by 
c n , c(t) and c(t , tj,yj), respectively. To apply the error propagation estimate (5.18), 
we assume that 

l|c(Mj)2/j)ll < P for tj <t< 1/(3 (5.19) 

and for all j satisfying tj = jh < 1/(3. This assumption will be justified by indue- 
tion later, and the value of ft will be specified in (5.21) below. By (5.18) we thus 
obtain the bound 
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\\y{t,tj,yj)-y(t,t j - 1 ,y j ^)\\<C(l + (t-t j ))S for tj<t<l/(3. 

Summing up from j = 1 to n gives for t n < t < 1 / p (and t > 2) 

n 

\\y(t,t n ,y n ) ~y(t)\\ < (i + {t-tj))5 <Ch~ 1 5(t n + tt n -t 2 n /2) 

3 = 1 

< Ch M / 2 < Ch 4 8//3 2 . (5.20) 

We now set 

p = (2Ch~ 1 S) 1/3 , (5.21) 

so that Ch -1 5/ft 2 = (3/2 , and we obtain the desired estimate from (5.20) by putting 

t t n . 

(c) We still have to justify the assumption (5.19). This will be done by induction. 
For j = 0 nothing needs to be shown, because c(t, 0, yo) = c(t ) = 0 as a conse¬ 
quence of the fact that y(t) stays on the invariant torus 7^ = {c = 0, i/j £ T d }. 
Suppose now that (5.19) holds for j < n. It then follows from (5.20) that 

\\c(t,t n ,y n )\\ <Ch~ l 5/!3 2 = (3/2 for t n <t<l//3 

(again because of c(t) = 0). Consequently we also have 

||^n+l || ^ ll^n+1 c(t n -|_ 15 5 Vn) || ||c(fn+l 5 tn 5 Dn) || ^ & H - Z^/2 ^ / 5 , 

provided that h is sufficiently small so that 8 < P/2. By continuity, c(t, t n +x } y n + 1 ) 
is bounded by P on a non-empty interval [£ n+ i, T n+1 \. The computation of part (b) 
shows that ||c(£, t n +i ? 2/n+i) || < P/2 on this interval. Hence, T n+ i can be increased 
until T n+ 1 >l/p. This proves the estimate (5.19) for j = n + 1. □ 


X.6 Invariant Tori of Symplectic Maps 

In the preceding section, backward error analysis combined with Kolmogorov’s the¬ 
orem has shown that a symplectic integrator applied to a Hamiltonian system with 
KAM tori possesses tori that are near-invariant, up to exponentially small terms, 
over exponentially long times in the inverse of the step size. To obtain truly invariant 
tori, we need a discrete KAM theorem for perturbations of integrable near-identity 
maps depending on a small parameter, the step size. Such a result was recently ob¬ 
tained by Shang (1999, 2000), who gave a discrete Arnold-type construction. Here, 
we use instead a discrete-time version of Kolmogorov’s iteration. This establishes 
the existence of invariant tori of symplectic integrators applied to integrable Hamil¬ 
tonian systems or to near-integrable systems with KAM tori, for a Cantor set of 
non-resonant step sizes. 
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X.6.1 A KAM Theorem for Symplectic Near-Identity Maps 

We consider a discrete-time analogue of the situation in Sections X.2.3 and X.5.1 
and construct the corresponding version of Kolmogorov’s iteration. Consider the 
symplectic map ah : (a, 6) i—► (a, 6) for a near 0 G 0 ET d defined by 

a = a — h , 6 = 6 + h — (a, 6) (6.1) 

dO oa 

where h is a small parameter (the step size), and S : B r (0) x T d —> M is a real- 
analytic generating function. If 5(a, 0) has the form (cf. (2.8)) 

5(a, 0 ) = c + u; • a + f a T M(a, 6)a , ( 6 . 2 ) 

then the associated symplectic map is of the form 

a = a + 0 (/i||a|| 2 ) , 6 = 6 + hco + 0 (/i||a||) . 

Hence, the torus {a = 0, # G T d } is invariant, and on it the map ah reduces to 
rotation by huj. 

Consider now an analytic perturbation of such a generating function: S(a,6) + 
eR(a, 6) with a small 5 . We construct a near-identity symplectic change of coor¬ 
dinates, via an iterative procedure similar to Kolmogorov’s iteration of Sect. X.2.3, 
such that the generating function of the perturbed symplectic map in the new vari¬ 
ables is again of the form ( 6 . 2 ) with the same cj, and hence the perturbed map has an 
invariant torus on which it is conjugate to rotation by huo. This holds if hu; satisfies 
the following diophantine condition (cf. (2.4)): 

> 7* | k\~ v * for keZ d ,k^0, (6.3) 

for some positive constants 7*, v*\ and if the angular average M 0 of M( 0, •) is 
invertible: 

\\M 0 v\\ > M*|H| for v G M. d (6.4) 

for a positive constant //*. As in Sect. X.2.3, we construct a symplectic transfor¬ 
mation (a, 6) i—> ( 6 , (f) as the time -6 flow of an auxiliary Hamiltonian of the form 
( 2 . 10 ), viz., 

d 

xip, ip) = £ • ip + Xo(<fi) + X] h iXi{P>) 

i= 1 

where £ G is a constant vector, and Xo,Xi , • • •, Xd are 27r-periodic functions. 
We then consider the map conjugate to the perturbed map (a, 0) i—» (a, 0) generated 
by S(a,ff)+eR(a,0): 


1 — e 
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We construct x m such a wa^nthat the above composed symplectic map is generated 
by S(b<p) + £ 2 R(b , (p) with S of the form ( 6 . 2 ) and both S and R real-analytic and 
bounded independently of 6 and of h with (6.3). The map ( 6 , p) i—► ( 6 , (p) is then of 
the form 

b = b + 0(h\\b\\ 2 ) + 0(he 2 ) , p = p + huo + 0{h\\b\\) + 0(he 2 ) . 

As an elementary calculation shows, this holds if x satisfies for all ( 6 , p) with b 
near 0, p G T d 

x(b,<p)-x(fi,<p-huj) + A^_ ft*) + (p) = C h + 0(||6|| 2 ) 

where Ch does not depend on ( 6 , p) and 6 . Writing down the Taylor expansion 

d 

R(b, (p) = Ro((p) + J2 biR ^) + 0(\\b \\ 2 ) 


and inserting the above ansatz for x, this condition becomes fulfilled if, with u{p) = 
M(0,ip)£ and v(ip) = M( 0 , ip)(dxo/d(p)(<p - hw), 

^)-X^-M +Ro(g) = Bo ( 6. 5) 

' ' l> ' 1 ——— + ",iii + ".(3) + 6 '," 3 ) = Ui + "j + iij ( 6 . 6 ) 

Ui Vi -\- Ri = 0 (i = 1,..., d) (6.7) 

where the bars again denote angular averages. We note 

Xo(£) - Xo (0-hu>) ^ 1 - e~ ik ' hu ik ~ 

-*- = U—ft— Xo - fce ’ 

k 

where xo,fc are the Fourier coefficients of xo- Under the diophantine condition (6.3), 
Equation (6.5) is thus solved like (2.14) under condition (2.4). Equations ( 6 . 6 ) are 
of the same type. The above system is then solved in the same way as (2.12)-(2.16), 
yielding that the perturbed map in the new coordinates, ( 6 , p) ( 6 , p), is generated 
by 

S w ( b , (p) = c (1) + w ■ b + | 6 t M (1) (b, (p)b + e 2 i? (1) ( b , ip) 

with unchanged frequencies u> and with A / (1 * (6, (p) = M(b, (p) + 0(e). The pertur¬ 
bation to the form (6.2) is thus reduced from O(s) to 0(e 2 ). By the same arguments 
as in the proof of Theorem 5.1 it is shown that the iteration of this procedure con¬ 
verges. This proves the following discrete-time version of Kolmogorov’s theorem. 

Theorem 6.1. Consider a real-analytic function S(a,6) of the form (6.2) with (6.4), 
defined on a neighbourhood of { 0} x T d . Let \h\ < ho (ho so small that (6.1) is a 
well-defined map) and suppose that huo satisfies (6.3). 
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Let S £ (a , 6) = S(a, 0) + eR(a , 0) fte a/i analytic perturbation of S(a, 0), gen¬ 
erating a symplectic map <Jh, £ • (a, 0) ► (a, 0) vza (<5.7) with S £ in place of S. 

Then, there exists Sq > 0 such that for every s with \e\ < £o, there is an analytic 
symplectic transformation if>h,e : (b,ip) i—> (a, 0), 0(e) close to the identity uni¬ 
formly in h satisfying (6.3) and analytic in £, such that o ° f>h,e : (ft, <p) *—» 

(ft, <^) A generated, via (6.1), by a function Sl £ (b , <J5) which is again of the form 
(6.2), i.e., 

Sh,e( b ’ v) = +u-b + \b T M hiS (b, <p)b . 

The perturbed map a^ £ therefore has an invariant torus on which it is conjugate to 
rotation by hue. 

(The threshold Eq depends only on <7, z /*,7 *,//* and on bounds of S and R on a 
complex neighbourhood of {0} x T d .) □ 


X.6.2 Invariant Tori of Symplectic Integrators 

As a direct consequence of Theorem 6.1 we obtain the following result on invariant 
tori of symplectic integrators applied to KAM systems. 

Theorem 6.2. Apply a symplectic integrator of order p to a perturbed integrable 
system with a KAM torus T u which carries a quasi-periodic flow with diophantine 
frequencies uj. Then, if the step size h is sufficiently small and satisfies the strong 
non-resonance condition (6.3), the numerical method has an invariant torus Tu,h 
0(h p )-close to %j, on which it is conjugate to rotation by huo. 

Proof. Theorem 6.1 applies directly, with 5 = h p , to the above situation. Here, 
the generating function S(a, 0) of the time-ft- flow of the Hamiltonian system 
with the KAM torus T u is of the form (6.2) in the variables (a, 6) obtained by Kol¬ 
mogorov’s theorem. The matrix M(a, 6) in (6.2) then differs from the corresponding 
matrix of (2.8) by 0(h), so that (5.3) implies (6.4). Finally, the generating function 
of the numerical one-step map $h is an 0(h p ) -perturbation S(a, 0) + h p R(a 1 6). □ 

X.6.3 Strongly Non-Resonant Step Sizes 

Theorem 6.2 leaves us with an interesting question: if uj E is a vector of frequen¬ 
cies that satisfies the diophantine condition (2.4), then which step sizes ft satisfy the 
non-resonance condition (6.3)? Here we give a lemma in the spirit of results by 
Shang (2000). It shows that the probability of picking an ft E (0, ft-o) satisfying 
(6.3) tends to 1 as ft 0 —> 0. 

Lemma 6.3. Suppose uj E satisfies (2.4), and let ft-o > 0. For any choice of 
positive 7 * and v*, the set 

Z(fto) = {ft G (0, ft-o) ; ft does not satisfy (6.3)} 
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is open and dense in (0, ho). If 7* <7 and is* > v + d + r with r > 1, then the 
Lebesgue measure of Z(ho) is bounded by 


measure (Z(/i 0 )) < C — h q +1 
where C depends only on d, is, is* and ||u;||. 

Proof. It is clear from the definition that Z(hf) is open and dense in (0, ho). It 
remains to prove the estimate of the Lebesgue measure. For every k G Z d and 
\h\ < ho, there exists an integer l = l(k,h) such that 


\ l _ e -ik-hu\ > 1 \k • hw - 2 irl\ 

7T 


| k • u\ 


h — 


27 rl 
|k • u\ 


For this l we must have, by the triangle inequality, 


2 tt\ 1 \ < 7 T + \k\ ho ||cj|| , 


so that in case l 0 

J_ < fro IMI 

\k\ ~ 2 k{\1\-\) • 

On the other hand, l = 0 yields 


1 - 


g-ik-hu) 


h 


> — \k • u\ > 
7 r 


2 
7 r 


7 \k\~ v 


which implies h £ Z(ho). Hence, h can be in Z(ho) only if there exist k e Z d , 
k 7 ^ 0 and an integer l ^ 0 such that 


27 d 
| k • a; | 


< 


< 


7 T \h\ 7* 7 T |, | \W_ 

2 |fc-w| |fc|"* - 2 11 7 \k\”* 


7 T 7* 

27 


1*1 


z^+r—zX 


IMj 1 

2tt \l\-\ 


K +1 . 


It follows that 


measure (Z(/i 0 )) < 


HI 1 

2^ \l\-\ 



which yields the stated result. 


□ 


X.7 Exercises 

1. Let R be a d x 2d matrix of rank d. Show that there exists a symplectic 2d x 2d 
matrix A such that RA = (P, Q) with an invertible d x d matrix P. 

Hint. Consider first the case d = 2 and then reduce the general situation to a 
sequence of transformations for that case. 
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Fig. 7.1. Numerically obtained eigenvalues (left pictures) and errors in the eigenvalues (right 
pictures) for the step sizes h — 0.1 (dotted) and h — 0.05 (solid line) 


2. The transformation (x, y ) i—» (x, y + d(x, y)) is symplectic if and only if the 

partial derivatives of d satisfy d x = , d y = 0 . 

3. In the situation of Lemma 1.1, if (T\,..., Fd, G %. $ ..., Gd) T is another such 
symplectic transformation, then there exists a smooth function W depending 
only on x = (#i,..., Xd) such that, for Xj = Fj(p, q), 

dW 

Gi(p,q) - Gi(p,q) = (x) . 

OXi 

Hint. Use the previous exercise. 

4. Show that every discrete subgroup of R d is a grid, generated by k < d linearly 
independent vectors. 

Solution. See e.g. Arnold (1989), Sect. 10D. 

5. Show the following bound of the Lebesgue measure of non-diophantine fre¬ 
quencies (Arnold 1963): for any bounded domain Q C 

measurejo; G Q ; u does not satisfy (2.4) with v > d] < C(d, 12)7 . 

Hint. For a fixed k , decompose u = cjo + Oik/\k\ with cjo • k = 0. 

6. Show that the eigenvalues Xj of the matrix L of the Toda system are first inte¬ 
grals in involution. 

Hint. For P\ = det(AJ — L), show that {Pa, P y } = 0 for all A, y. 

7. We repeat the experiment of Fig. 1.3 with the Stormer-Verlet scheme, where 
we keep the initial values for the g-variables, but change the initial values for 
the p -variables to pi = P 2 = P 3 = 0. The numerical results, given in Fig. 7.1, 
are qualitatively different from those in Fig. 1.3. The errors behave more like 
hc(th) rather than h 2 c(t). We do not understand this behaviour; do you? 

8. Show that for a non-symplectic numerical method, there is at worst quadratic 
error growth in time when it is applied to an integrable Hamiltonian system. 
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9. Consider a numerical integrator of order p (i.e., @h(y) = Vh(y) + 0(h p+1 )), 
and assume that 

$' h (y) T J<P' h (y ) = J + 0{h q+1 ) 

with q > p, when the method is applied to a Hamiltonian system. Prove that 
under the assumptions of Theorem 3.1 the global error behaves for t = nh like 

Vn ~ y(t) = 0(th p ) + 0(t 2 h q ), 

and the action variables like 

I(Vn) - I(vo) = 0{hP) + 0(th q ). 

Remark. Methods satisfying the assumptions of this exercise are called pseudo- 
symplectic of order (p,q) (Aubry & Chartier 1998). Pseudo-symplectic meth¬ 
ods behave like symplectic methods on time intervals of length 0(h p ~ q ). 

10. Using the theory of B-series, in particular Theorem VI.7.4, derive the conditions 
for the coefficients of a Runge-Kutta method such that it is pseudo-symplectic 
of order p(q). Prove that there exist explicit, pseudo-symplectic Runge-Kutta 
methods of order (2,4) with 3 stages. 



Chapter XI. 

Reversible Perturbation Theory 
and Symmetric Integrators 


There is a very close similarity between the behaviour of solutions of 
reversible systems and that of Hamiltonian ones. 

(M.B. Sevryuk 1986, p. 3) 

Numerical experiments indicate that symmetric methods applied to integrable and 
near-integrable reversible systems share similar properties to symplectic methods 
applied to (near-)integrable Hamiltonian systems: linear error growth, long-time 
near-conservation of first integrals, existence of invariant tori. The present chap¬ 
ter gives a theoretical explanation of the good long-time behaviour of symmetric 
methods. The results and techniques are largely analogous to those of the previous 
chapter - the extent of the analogy may indeed be seen as the most surprising feature 
of this chapter. 


XI. 1 Integrable Reversible Systems 


We consider a system of differential equations on a domain of M m x M n , 

u = f(u,v ) 
v = g(u, v) , 


( 1 . 1 ) 


which is reversible with respect to the involution ( it , v ) i— > (it, — v)\ for all (it, u), 

/("• '•) = /("• '■) 

g(u,-v) = g(u,v). 


From Sect. V. 1 we recall that the time-t flow ip t of a reversible system is a reversible 
map : 

(ft( it,u) = (it,u) implies (p^T 1 (it, —v) = (it, —v) . 


A coordinate transform it = p(x, y), v = v(x,y) is said to preserve reversibility if 
the relations 


g(x,-y) = n{x,y) 
v{x,-y) = -v{x,y) 


hold for all (x,y). This implies that every reversible system (1.1) written in the new 
variables (x, y) is again reversible, and that every reversible map (it, v) i—► (it, v) 
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expressed in the variables (x, y ) again becomes a reversible map (x,y) (x, y). 

Conversely, (1.3) is necessary for these properties. 

For Hamiltonian systems, complete integrability is tied to the existence of a 
symplectic transformation to action-angle variables; see Sect. X.l. For reversible 
systems, we take the existence of a reversibility-preserving transformation to such 
variables as the definition of integrability. 

Definition 1.1. The system (1.1) is called an integrable reversible system if, for 
every point (uo,vo) £ M m x W 1 in the domain of (/,#), there exist a function 
u = (u i,..., (jO n ) : D —> W 1 and a diffeomorphism 

ip = (/i, v) : D x T n -► U C M m x M n : (a, 0) ^ (u, v ) 

(with D and V open sets in M m and M m x M n , respectively, and (iao, ^o) £ U), 
which preserves reversibility and transforms the system (1.1) to the form 

am 0 

(1.4) 

0 = u>(a) . 

We speak of a real-analytic integrable reversible system if all the functions appear¬ 
ing in the above definition are real-analytic. 

Example 1.2 (Motion in a Central Field). In Examples X.l.2 and X.l. 10 we con¬ 
structed action-angle variables via a series of transformations 

H , L y x ^ 

2 / 1 , 2/2 ) 



Ql,P2 

Pl,Q2 


(X.l.5) 


(f,p r 


(X.1.6) 


It is easily verified that all these transformations preserve reversibility. They trans¬ 
form the reversible system 


Qi=Pi, i >2 = -q 2 V'(r)/r 
h=P 2 , Pi = ~qiV'(r)/r 

(with r = yq{ + q?,) to the form 

H = 0, L = 0 
di = ~ % o 2 = ® 

with T = T(H, L ) and = $(H, L) given by (X.l.12) and (X.l.13). 


(1.5) 


( 1 . 6 ) 


As the following result shows, it is not incidental that the above transformations 
preserve reversibility. 
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Theorem 1.3. In the situation of the Arnold-Liouville theorem, Theorem X.1.6, let 
the first integrals Fi ,..., of the completely integrable Hamiltonian system be 
such that all Fi are even functions of the second half of the arguments: 

Fi(u,v)=Fi(u,-v) (i = l,...,d). (1-7) 

Suppose that dFi/du,... ,dFd/du are linearly independent everywhere (on 
(J {M x : x G B}) except possibly on a set that has no interior points. Further, 
assume that for every x G B there exists u such that (it, 0) G M x . Then, the trans¬ 
formation : (a, 6) i—► (it, v) to action-angle variables as given by Theorem X.1.6 
preserves reversibility. 

Proof. The result follows by tracing the proofs of Lemma X.1.1, Theorem X.1.4 
and Theorem X.1.6. 

(a) For Fi satisfying (1.7) and at points where the Jacobian matrix dF/du is 
invertible, the construction of the local symplectic transformation £ = (F\,..., Fd, 
G i,..., Gd) : (it, v) i—► (x, y ) shows that the generating function S(x, v ) becomes 
odd in v when the integration constant is chosen such that S(x, 0) = 0. By (X.1.4), 
this implies that £ preserves reversibility. A continuity argument used together with 
the essential uniqueness of the transformation £ (see Exercise X.3) does away with 
the exceptional points where dF/du is singular. 

(b) In Theorem X.1.4, the construction of e(x, y ) = (f y (£~ 1 (x , 0)) =: (it, v) is 
such that 

e(x,-y) = ip^ y (e~ 1 (x,0)) = (u,-v) . 

This holds because by assumption the reference point on M x can be chosen as 
^ _ 1 (x, 0 ) = (ito, 0 ) for some ito, and because (p± y is the time ±1 flow of the 
Hamiltonian system with Hamiltonian y\F\ + ... + yaFd. Condition (1.7) implies 
that this is a reversible system, which in turn yields that e preserves reversibility as 
stated above. 

(c) The transformation in the proof of Theorem X.1.6 is of the form a = w(x), 
y = W(x)6 (with invertible W(x) = w'(x)) and hence preserves reversibility. □ 

Example 1.4. We now present an example with 
one degree of freedom where Theorem 1.3 does 
not apply. In fact, all conditions are satisfied 
except that for some x there is no u such that 
(it, 0) G M x . We consider the Hamiltonian 

rli 

H(u, v) = (v 2 — l) 2 + / s(s + l) 4 ds. 

Jo 

Its level sets are shown in the picture to the 
right. For energy values such that the level curve 
does not intersect the it-axis, Theorem 1.3 does 
not apply even though H(u,v) satisfies (1.7). 

For these energy values the system is an in¬ 
tegrable Hamiltonian system, but not an inte¬ 
grable reversible system . 
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Example 1.5 (Motion in a Central Field, Continued). All the assumptions of 
Theorem 1.3 are satisfied for F\ = H, F 2 = L = piq 2 — P 2 Q 1 if we take the 
symplectic coordinates u = (gi,P 2 ) and v = (— pi f q 2 ). 

The condition (1.7) is also satisfied with Fi = H , F 2 = L 2 (L 7 ^ 0 as always) 
for the choices u = (pi,P 2 ) and v = (gi, < 72 )* or u = (gi, < 72 ) and v = (—pi, — P 2 ). 
However, in these situations, Theorem 1.3 cannot be applied, because there does not 
exist u such that (it, 0) E M x . 

Example 1.6 (Toda Lattice). Consider the Toda lattice of Sect. X.1.5. The eigen¬ 
values of the matrix L are first integrals in involution. With the symplectic coordi¬ 
nates (u,v) = (g, —p) the Hamiltonian system corresponding to (X.1.17) satisfies 
the reversibility conditions (1.2). However, since v\ + ... + v n is a first integral of 
this system, it is not possible to connect (it, v) with (it, —v) on a level set M x , and 
Theorem 1.3 cannot be applied. 

Fortunately, as can be seen in Fig. 1.1, the Toda lattice contains many more 
symmetries. With periodic boundary conditions it is, for example, p-reversible (i.e., 
pf(y ) = — f{py ), y = (p, g) T , see the discussion in Chap. V) with 



where S inverses the components of a vector. To bring the system to the form ( 1 . 1 ) 
with a vector field satisfying ( 1 . 2 ), we transform S (and hence p) to diagonal form 
and collect the variables corresponding to the eigenvalues +1 and — 1 in u and v, 
respectively (see Exercise 1). This gives the (symplectic) coordinates 


Uk = 

7 

[pk Pn—k+l) 5 

1 , 

u n—k+l — ' 

[qk Qn—k+l) • 

Vk = 

7 

(^Qk H - Qn—k+ 1 ^ 5 

Vn ~ k+1 = 71 1 

^Pn—fc+1 Pk) 


for k = 1 ,..., n/2 (if n is even; for odd n = 2£ + 1, (1.8) holds for k = 1 ,..., £ 
and in addition we have U£+i = pt +1 and = (p+i). 

In the following we restrict our considerations to the case n = 3, and we show 
that all assumptions of Theorem 1.3 are satisfied, so that we have an integrable 
reversible system. For n = 3, the new variables are 



and the expressions a & and bk of Sect. X.1.5 become 

01 =■ 7 (“■ - vs )■ 61 = 5 exp G (71 f" 1+ - w )) ■ 

= ~\ u 2, h = ~ e x p (1 (i>2 - -T(vi -M 3 ))), 

a3 = -^( Ul+V3 )’ , ’ 3 = 5 exp E“ 3 )- 
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Fig. 1.1. Three projections of the solution of the Toda lattice equations (n = 3) with initial 
values as in Fig. X.1.3 

One sees that b\ + b\ and a\b 2 + a^b 2 are even functions of v, so that all coefficients 
of the characteristic polynomial of the matrix L 

x(A) = —A 3 + (ai + a2 + &3)A 2 — (clicl2 + U2&3 + a3ai ~ b\ — b\ — 63)A + 
(aia 2 a 3 - ai&| - a 2 &| - a 3 6j + 2 bib 2 b 3 ) . 


are even in v. This implies that also the eigenvalues of L are even functions of v, so 
that (1.7) is satisfied. 

It remains to prove that for fixed x, i.e., for given real eigenvalues of L, the point 
(uo, uo) corresponding to p(0), g(0) can be connected with an element of the form 
(it, 0) G M 6 without leaving the level set M x . Equivalently, we have to find such a 
path for which the corresponding coefficients of the characteristic polynomial %(A) 
take given values. For given v(t) this yields a system of three nonlinear equations 
for u(t) G M 3 . For the eigenvalues corresponding to the initial values p(0),q(0) 
used in Fig. X.1.3, we put v(t) = Vot for 1 > t > 0 and we check numerically with 
a path-following algorithm that such a connection is possible. 

Example 1.7 (Rigid Body Equations on the Unit Sphere). We reconsider an ex¬ 
ample that has accompanied us all the way through Chapters IV, V, and VII.5: the 
rigid body equations (IV. 1.4), here considered as differential equations on the unit 
sphere. We assume h < h,h for the inertia, which implies that any solution start¬ 
ing with 7 / 3 ( 0 ) > 0 will have ys(t) > 0 for all t. We consider the equations in the 
neighbourhood of such a solution. We can then choose u = yi, v M 2/2 as coordi¬ 
nates on the upper half-sphere {y\ + 2/2 +^3 = 1, 2/3 > 0}* This gives the reversible 
system 

u = aivy/l — u 2 — v 2 

Aj- 2 - 2 (1-9) 

V = d2U\J 1 — U A — V z 

with a\ = (I 2 — I 3 )/I 2 I 3 > 0 and a 2 = (I 3 — Ii)/Isli < 0, which has H = 
u 2 /Ii + v 2 /I 2 + (1 — u 2 — v 2 )/Iz = CL 2 U 2 — a\v 2 + 73" 1 as an invariant. We 
introduce polar coordinates u = r cos (p,v = rsuup and express r as a function of 
H and ip: 
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r = 


Is 1 - H 


a\ sin 2 ip — <i 2 cos 2 ip 


This leaves us with differential equations 

H = 0 , (p = Pf{H, ip), 

where 7 is even in ip and has no zeros. The time needed to run through an angle ip is 
rv 1 27r 

T{H ’ v) = l WT) d4 ’’ u(h) = ah^) 

is the frequency. With 0 = ip) we then have 

H = 0, 0 = u(H) . 


The transformation from ( u , v) in the open unit disc (except the origin) to (iT, 0) G 
(0, I^ 1 ) x T is a diffeomorphism that preserves reversibility. This shows that the 
rigid body equations (1.9) are an integrable reversible system. 


Example 1.8 (Rigid Body Equations in M 3 ). We now consider the rigid body 
equations (IV. 1.4) in the ambient space M 3 , rather than on the unit sphere. The 
system then has the invariants H = y\/I\ + y\/I 2 + y\/h and K = y\ + y\ + y \, 
and it is reversible with respect to the partition u = ( 2 / 1 , 273 ) and v = 2/2 • In the 
case I 3 < I \, I 2 we can again restrict our attention to 2/3 > 0. We then write 
2/3 = \JK — y\ — 2/2 and introduce polar coordinates yi = r cos y?, 2/2 = ^ sin 
As above, we express r as a function of iT, iT and ip (this just requires replacing 
AT 1 with K/Is in the above formula for r) and we obtain differential equations 


H = 0, A = 0. ip = ^/(H,K,ip) 


with 7 even in ip and without zeros. In the same way as above, this is transformed to 

H = 0, K = 0, 9 = u(H,K). 

The transformation ((2/1,2/3)52/2) ^ ((H,K), 6 ) preserves reversibility. The rigid 
body equations (IV. 1.4) are thus an integrable reversible system. Note that this time 
the dimensions differ. 


XI.2 Transformations in Reversible Perturbation 
Theory 

We consider perturbations of an integrable reversible system such that the perturbed 
system is still reversible. This takes the form 
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a = er(a,Q) 

( 2 . 1 ) 

0 = l o(a)-\-ep(a,0) 

where £ is a small parameter, and r is an odd function of 0 and p is an even function 
of 0 : 

r(a,-0) = — r(a, 9) 
p(a, — 0) = p(a,9) . 

Similar to Sect. X.2 for Hamiltonian perturbation theory, we study coordinate trans¬ 
formations that change ( 2 . 1 ) to reversible systems which - in various ways - look 
closer to an integrable system in action-angle variables than ( 2 . 1 ). 


XI.2.1 The Basic Scheme of Reversible Perturbation Theory 


We look for a transformation between neighbourhoods of {ao} x T n , 

a = b + es(b, p) 

9 = p + ea(b, p) , 


(2.3) 


which preserves reversibility and hence has 8 even in p and a odd in p, such that 
the transformed system is of the form 


b = 0(e 2 ) 

p = u(b) + ep(b) + 0 (e 2 ) . 


(2.4) 


Inserting (2.3) into (2.1) gives the system 

f (I 0 \ ( ds/db ds/dp \ ) / b \ _ ( er(a,Q) \ 

IJ £ \dcr/db da/dp) ) \P J \u(a) + £p(a,0) J 

with (a, 0 ) from (2.3). Inverting the matrix on the left-hand side and expanding in 
powers of 5 , it is seen that (2.4) requires that 8 , a satisfy the equations 

d s 

— (b,ip)u)(b) = r(b,ip) (2.5) 

/£(b,ip)Lo(b) = p(b,<p)+u'(b) s(b,<p) - n(b) . (2.6) 

A necessary condition for the solvability of (2.5) is that the angular average of r 
vanishes: 

r(b) = 0 , where r(b) = [ r(b, ip) dip . (2.7) 

(Z7T) Jjn 

In the Hamiltonian case this condition was satisfied because r was a gradient with 
respect to p. Here, in the reversible case, this is satisfied because r is an odd function 
of p. 
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If (2.7) holds, then (2.5) can be solved by Fourier series expansion in the same 
way as we solved (X.2.2), provided that the frequencies o;i(b),..., u n (b) are non¬ 
resonant. Of course, there is again the same problem of small denominators as in the 
Hamiltonian case. Equations (2.6) are solved in the same way as (2.5), upon setting 

n(b) = p(b) + u\b)s(b) . ( 2 . 8 ) 

Since r is odd in p, the solution s of (2.5) becomes even in p . It is determined 
uniquely only up to a constant: we are still free to choose the angular average s(b). 
If u/(b) has rank n, we may actually choose s(b) such that p(b) = 0 results from 
(2.8). Since the right-hand side of (2.6) is even in p, the solution a of (2.6) becomes 
odd in p if we choose <r(b) = 0 . 


XI.2.2 Reversible Perturbation Series 

The above construction extends to arbitrary finite order in 6. The transformation is 
now sought for in the form 

a = b + esi(6, p) + £ 2 s 2 (b, p) + ... + e N ~ 1 SN-i(b, p) (2.9) 

6 = p + eai(b, p) + e 2 G 2 {b, p) + ... + e N ~ 1 GN-i{b, p) ( 2 . 10 ) 

with Sj even in p and Gj odd in p to preserve reversibility. This transformation is to 
be chosen such that the system in the new variables is of the form 

b = e N r N (b, if) 

V = u e ,N(b) + £ N Pn (b,<p) 

with c o £: jsf(b) = uj(b) + £pi(b) + ... + pN-i{b), and with rjsf(b, p) odd in p 

and pN (6, p) even in p, and with all these functions bounded independently of 5. 

Inserting the transformation into (2.1) and expanding in powers of 6, it is seen 
that the functions Sj and Gj must satisfy equations of the form of (2.5), (2.6): 

= Pj(b,<p) (2.11) 

= 7Tj(b,p) + uj'(b) Sj(b, p) — fJLj(b) (2.12) 

where pj , Gj are given by expressions that depend linearly on higher-order deriv¬ 
atives of r, p and polynomially on the functions si , G{ with i < j and on their 
first-order derivatives. Using the rules 


ri e • 

Q^(b,v>)u(b) 

(b, ¥>)v(b) 


and 


even odd 
odd even 


odd 

even 


odd 

even 
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Seven 

dp 


= odd , 


dodd 

dip 


= even , 


it is found that pj is odd in p and iTj is even in p for all j. For non-resonant frequen¬ 
cies Lo(b), the equations (2.11), (2.12) can therefore be solved with Sj even in p, <jj 
odd in p. If u/(6) is invertible, we can obtain Pj(b) = 0 for all j. 

Beyond these formal calculations, there is the following reversible analogue of 
Lemma X.2.1 in the Hamiltonian case. This result is obtained by the same “ultra¬ 
violet cut-off” argument as the earlier result. 


Lemma 2.1. Let the right-hand side functions of (2.1) be real-analytic in a neigh¬ 
bourhood of {6*} x T n and satisfy (2.2). Suppose that u(b*) satisfies the dio- 
phantine condition (X.2.4). For any fixed N > 2, there are positive constants 
So,c,C such that the following holds for e < Sq: there exists a real-analytic 
reversibility-preserving change of coordinates (a,0) i—> (b,p) such that every so¬ 
lution ( b(t),p(t )) of the perturbed system in the new coordinates, starting with 
||6(0) — 6*|| < c| log£| -zy_1 , satisfies 

\\b(t)-b(0)\\<Cte N for t < e~ N+1 , 

II <p{t) ~ u e , N (b(0))t - <£>(0) || < C(t 2 + t\ loge|*' +1 ) s N for t 2 < s~ N+1 . 

Moreover, the transformation is 0(e)-close to the identity: ||(a, 0) — (6, p) || < Ce 
holds for (a, 0) and (6, p) related by the above coordinate transform, for ||6 — 6* || < 
c| log e\~ v ~ l and for p in an e-independent complex neighbourhood of T n . 

The constants So,c,C depend on N , n, 7 , v and on bounds ofcc, r, p on a com¬ 
plex neighbourhood of {6*} x T n . □ 

The equations determining the coefficient functions of the perturbation series 
are of the form to which Lemma X.4.1 applies. Therefore, that lemma is again the 
tool for estimating the terms in the perturbation series, similar to Sect. X.4.1. This 
yields a reversible analogue of Theorem X.4.4 showing near-invariance of tori (up 
to exponentially small terms in a negative power of e) over time intervals that are 
exponentially large in a negative power of 5 , with the same exponents a, (3 as in 
Theorem X.4.4. 


XI.2.3 Reversible KAM Theory 

For an integrable reversible system, just as for an integrable Hamiltonian system, 
the phase space is foliated into invariant tori on which the flow is conditionally 
periodic. We fix one such torus {a = a*, 6 G T n } with diophantine frequencies 
cji, ..., uo n . For convenience we may assume a* = 0 G M m . This torus is invariant 
under the flow of systems of the form a = 0(||a|| 2 ), 6 = u + 0(||a||), or written 
more explicitly, 

a = ^a T K(a,0)a 
6 = uj H~ M (a , 6)cl . 


(2.13) 




446 XI. Reversible Perturbation Theory and Symmetric Integrators 


Here, K = [Ki ,..., K m \ where each TQ(a, 0) is a symmetric m x m matrix, 
and M(a, #) is an n x ra matrix. The first equation is to be interpreted as hi = 
\a r Ki(a, 6)a for the components i = 1,..., m. Consider now a perturbation of 
this system: 


a = \a r K(a,6)a + er(a,6) 

6 = uj -b Af(u, O^a -b sp{cL^ 6) . 


(2.14) 


For the reversible case, i.e., for K and r odd in 0 and for M and p even in 0 , 
we construct a sequence of reversibility-preserving transformations in the spirit of 
Kolmogorov’s transformation of Sect. X.2.3, which transform (2.14) back to the 
form (2.13) in the new variables, showing the persistence of an invariant torus with 
frequencies c Oi under small reversible perturbations of the system. This holds again 
under the diophantine condition (X.2.4) on u and additionally under the condition 
that the angular average M o of M at a = 0 has rank n. A result of this type - 
a reversible KAM theorem - was shown by Moser (1973), Chap. V, in a different 
setting. See also Sevryuk (1986) for further results in that direction. 

We look for a transformation of the form 


a = b + €^s(ip) + S'(y?)6^ 
0 = (p + £cr(<p) 


(2.15) 


with an m x m matrix S(p). Preserving reversibility requires that 8 and S are even 
functions and a is odd. Higher-order terms in b play no role and are therefore omitted 
from the beginning. We insert this into (2.14) and obtain 

b = I b T K(b, if)b + £ jr(0, <p) — 

+ |(0, <p)b - |( V )M(0, <p)b - A (s(<p)b)u> + s(<p) T K( 0, <p)b} 

+ 0(e 2 ) + 0(e\\b\\ 2 ) 


p = uj -b M(b. (p)b 

+ e{p( 0, <P) - ^(<p)co + M(0, tfsiv)} + 0{e 2 ) + 0(e\\b \\). 

We require that the terms in curly brackets vanish. This holds if the following equa¬ 
tions are satisfied (the last equation is written component-wise for notational clar¬ 
ity): 

A(»w = r(0, <p) 

d<j 

— (v?)w = p(0,tp) + M(0,ip)s(<p) (2.16) 

f)Q. . . _ f) q . 

= S + ^s k (ip)K itkj (0,ip). 
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Since r is odd in p, the first equation can be solved for 5 even in p, uniquely up to a 
constant, the angular average s. Since the angular average of M is assumed to be of 
full rank n, s can be chosen such that the angular average of the right-hand side of 
the equation for a becomes zero. Since the right-hand side is even, the equation can 
then be solved uniquely for an odd a. The equations for S have an odd right-hand 
side and can therefore be solved for an even S. 

In this way, the perturbation to the form (2.13) is reduced from 0(e) to 0(s 2 ). 
By the same arguments as in the Hamiltonian case (see Sect. X.5), the iteration of 
this procedure is seen to be convergent. This finally yields a change of coordinates 
that preserves reversibility and transforms the perturbed system (2.14) back to the 
form (2.13). We summarize this in the following theorem, which is the reversible 
analogue of Kolmogorov’s Theorem X.5.1. 

Theorem 2.2. Consider a real-analytic reversible system (2.13). Suppose that uj G 
M n satisfies the diophantine condition (X.2.4), and that the angular average of 
M( 0, •) is an n x m matrix of rank n. Let (2.14) be a real-analytic reversible per¬ 
turbation of the system (2.13). Then, there exists £q > 0 (which depends on the 
perturbation functions only through a bound of their norms on a complex neigh¬ 
bourhood of { 0} x T n ) such that for every s with |^| < £ 0 , there is a real-analytic 
transformation : (6, p) 1 —» (a, 0), O(s) close to the identity and depending an¬ 
alytically on e, which preserves reversibility and puts the perturbed system back to 
the form (2.13) in the new variables: b = 0(||6|| 2 ), p = a;H-0(||6||). The perturbed 
system therefore has the invariant torus {6 = 0, p G T n } carrying a quasi-periodic 
flow with the same frequencies uj as the unperturbed system. □ 

XI.2.4 Reversible Birkhoff-Type Normalization 

We show that, in the situation of diophantine frequencies uj , there is a reversibility¬ 
preserving transformation that takes a reversible system of the form (2.13) to the 
form 


b = r k (b,<p) 

<p = w + Ck(b) + pk(b, <p) 


with r k , p k = 0(\\b\\ k ) (2.17) 


for arbitrary k > 2 , where £& = ~p 1 + . . 4 . ^ Pk-i the bars denoting angular 
averages and with pi( 6 , p) = M( 6 , p)b. This implies again that the invariant torus 
is “very sticky”: ||6(0)|| < 5 implies \\b(t)\\ < 26 for t < Ck6~ k+1 . As in the 
Hamiltonian case, a suitable choice of k would even yield time intervals exponen¬ 
tially long in a negative power of 6 during which solutions stay within twice the 
initial distance 6. 

The transformation to the normal form (2.17) is constructed recursively. Suppose 
that in some variables (a, 6) we have, for some k > 2 , 


a = r k -i(a,9) 

9 = uj + £/j_i(a) + p k -\(a, 9) 


with r k -i,p k -i = 0(\\a\\ k . 
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Note, for k = 2 we have r\ = 0(||a|| 2 ) by (2.13). We search for a transformation 


a = b + s(b, ip) 
0 = (p + a(b,ip) 


with s,a = (D(\\b\\ k - 1 ) , 


(and 5 = 0(||6|| 2 ) for k = 2) that preserves reversibility, i.e., has 8 even in ip and cr 
odd in ip, and is such that (2.17) holds. Inserting the transformation into the above 
differential equation shows that this is indeed achieved if s, a solve the following 
system of the form (2.5), (2.6): 

ds 

— = r k -i(b,tp) 

do" 

— +C' k -i(b)s(b,<f) -fi k (b) ■ 

Choosing s(b) = 0 leads to pk =J>k -1 an( i gives (2.17) with (k = Ck-i +Pk- 1 - 


XI.3 Linear Error Growth and Near-Preservation 
of First Integrals 


We now study the error behaviour of reversible methods applied to integrable re¬ 
versible systems. Recall from Theorem V.1.5 that symmetric methods are reversible 
under the compatibility condition (V.1.4). We give an analogue of Theorem X.3.1 
on the error behaviour of symplectic methods applied to integrable Hamiltonian 
systems. We consider an integrable reversible system (1.1) (usually not given in 
action-angle variables) and let (u, v ) = ^(a, 6) be the reversibility-preserving trans¬ 
formation to action-angle variables. The inverse transformation is denoted as 

(a, 9) = (I(u,v),&(u,v)) . 


The following is the reversible analogue of Theorem X.3.1. 

Theorem 3.1. Consider applying a reversible numerical integrator of order p to the 
integrable reversible system (1.1) with real-analytic right-hand side. Suppose that 
cc(a*) satisfies the diophantine condition (X.2.4). Then, there exist positive constants 
C, c and ho such that the following holds for all step sizes h < ho: every numerical 
solution starting with ||/(rto, Vo) — a*\\ < c \ log h\~ v ~ x satisfies 


\\(u n ,v n ) - (u(t),v(t))\\ < CthP 
\\I(u n , v n ) - I(u 0 , v 0 ) || <Ch p 


(3.1) 


The constants ho,c,C depend on 7 , v of (X.2.4), on the dimensions, on bounds of 
the real-analytic functions f,g on a complex neighbourhood of the torus {(u, v) : 
I(u , v) = a*}, and on the numerical method. 
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Proof. The proof of Theorem X.3.1 relied on Theorem IX.3.1 and Lemma X.2.1. 
Using their reversible analogues Theorem IX.2.3 and Lemma 2.1 with the same 
arguments gives the above result for the reversible case. □ 

Remark 3.2. As in the analogous remark for the Hamiltonian case, the error bounds 
of Theorem 3.1 also hold when the reversible method is applied to a perturbed in¬ 
tegrate system with a perturbation parameter 5 bounded by a positive power of the 
step size: s < Kh a for some a > 0. 

We consider the Hamiltonian system of Example 1.4 and apply the symmetric 
but non-symplectic Lobatto IIIB method with step size h = 0.01. In the left picture 
of Fig. 3.1 we choose the initial value (uo, Vo) = (0,1.5) for which the level curve 
of the Hamiltonian is symmetric with respect to the u-axis and the system is an inte¬ 
grate reversible system. The good conservation of the Hamiltonian is in agreement 
with Theorem 3.1. In the right picture we choose (uo, Vo) = (0,0.3) whose level 
curve is the fat line in the picture of Example 1.4 which does not intersect the rt-axis. 
Since in this situation we do not have an integrable reversible system, Theorem 3.1 
cannot be applied and we cannot expect good energy conservation. 



Fig. 3.1. Numerical Hamiltonian of Example 1.4 for two different initial values 


For the Toda lattice example, Figures 3.2 and 3.3 illustrate the long-time con¬ 
servation of the first integrals and the linear error growth, respectively, of the Lo¬ 
batto IIIB method. 

Theorem 3.1 together with Examples 1.7 and 1.8 also explains the good behav¬ 
iour of symmetric (in fact, reversible) integrators on the rigid body equations which 
we observed in Chap. V (Figs. V.4.2 and V.4.6). 

Variable Step Sizes: Proportional, Reversible Controllers. As a consequence of 
the backward error analysis of Theorem IX.6.1 the statement (3.1) can be extended 
straightforwardly to proportional step size controllers as discussed in Sect. VIII.3.1. 
Under the assumption of Theorem 3.1 with h and ho replaced by 5 and So one has 

\\(u n ,V n ) - (u(t n ),v(t n ))\\ < Ct n £ p 

II w , T , s,| . n v for t n < e p . (3.2) 
\\I(u n ,V n ) - I(uo,Vo)\\ < C £ p 

The grid {t n } is determined by the method and satisfies t n +% = t n + es{u n , v n ,e). 

Variable Step Sizes: Integrating, Reversible Controllers. We apply the backward 
error analysis of Theorem IX.6.2. The modified equation (IX.6.14) reduces to 
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Fig. 3.2. Numerically obtained eigenvalues (left picture) and errors in the eigenvalues (right 
picture) of the 3-stage Lobatto IIIB scheme (step size h — 0.1) applied to the Toda lattice 
with the data of Sect. X. 1.5 
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Fig. 3.3. Euclidean norm of the global error for the 3- stage Lobatto IIIB scheme (step size 
h = 0.1) applied to the Toda lattice with n — 3 and initial values as in Fig. 3.2 



V = f{y ), z = zG(y ) (3.3) 

for 6 = 0. Since G(y) = — (cr(y)) 1 Vcr(t/) T /(t/) with an analytic step size func¬ 
tion cr(y), the function (y,z) i—> za(y) is a first integral of (3.3). Suppose now 
that y = f(y) is the integrable reversible system (1.1). This means that there exists 
a reversibility preserving diffeomorphism y = #) transforming the system to 

action-angle variables. The diffeomorphism 


: ^(a,A,0) 


( VKM) A 

\A/a(^(a,0)) ) 


is then also reversibility preserving if a(u, — v ) = a(u, v), and it transforms (3.3) to 

a = 0 , A = 0 , 0 = uj(o). 


If the basic method of the algorithm (IX.6.9) is reversible and if a(u, — v ) = a(u, v ) 
holds, the modified equation (IX.6.14) is a reversible perturbation of (3.3). Conse¬ 
quently, Theorem 3.1 yields the statement (3.2) also for integrating step size con¬ 
trollers. Since A := 2 cr(it, v) is an action variable, we have in addition that 

| z n cr(un,v n ) - zoa(uo,vo)\ < Ce 2 


for t n < e p . Notice that the transformation (2.9) is 0{e p )-close to the identity 
for the variables a and 0 , but only O(e 2 )-close for A. This result proves that the 
integrating step size controller is as robust as the proportional controller. It also 
explains the excellent long-time behaviour observed in Figs. VIII.3.2 and VIII.3.3. 
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XI.4 Invariant Tori under Reversible Discretization 

In this section we study the question as to how invariant tori of reversible systems 
are preserved under discretization of the system by reversible numerical methods. 
We give reversible analogues of Theorems X.5.3 and X.6.1. 


XI.4.1 Near-Invariant Tori over Exponentially Long Times 

We consider a reversible system (1.1) which in suitable coordinates takes the per¬ 
turbed form (2.14). Under the conditions of the reversible KAM theorem, Theo¬ 
rem 2.2, this system has an invariant torus carrying a quasi-periodic flow with fre¬ 
quencies uj for sufficiently small e. Consider now a reversible numerical integrator 
applied to this system. By the same arguments as in Sect. X.5.2, using the reversible 
KAM theorem 2.2 in place of Kolmogorov’s Theorem X.5.1, we obtain the fol¬ 
lowing analogue of Theorem X.5.3, which states the existence of a torus such that 
numerical solutions starting on this torus remain exponentially close to a quasi- 
periodic flow on that torus over exponentially long times in 1 /h. 

Theorem 4.1. In the above situation, for a reversible numerical method of order p 
used with sufficiently small step size h, there is a modified reversible system with an 
invariant torus T u carrying a quasi-periodic flow with frequencies lj, 0{h p ) close 
to the invariant torus T u of the original reversible system, suchjhat the difference 
between any numerical solution (u n , v n ) starting on the torus and the solution 
of the modified Hamiltonian system with the same starting values re¬ 
mains exponentially small in 1/h over exponentially long times: 

\\(u n ,v n ) - (u(t),v(t))\\ <Ce~ K/h for t = nh<e K/h . 

The constants C and n are independent ofh,s (for h, e sufficiently small) and of the 
initial value (rto, vq) E T u . □ 

The case of initial values lying close to, but not on 7^, can again be treated by a 
reversible analogue of Theorem X.4.7. 

XI.4.2 A KAM Theorem for Reversible Near-Identity Maps 

To obtain truly invariant tori, we need a discrete analogue of the reversible KAM 
theorem, which is derived in this subsection. This result can also be viewed as the 
reversible analogue of Theorem X.6.1. It establishes the existence of invariant tori 
of reversible integrators, but as in the symplectic case, only for a Cantor set of non¬ 
resonant step sizes. 

A map : (a, 6) i—> (a, 6) has the invariant torus {a = 0, 6 G T n }, and reduces 
on this torus to rotation by hu; ( h a real parameter and u G M n ), when it is of the 
form (cf. (2.13)) 



452 XI. Reversible Perturbation Theory and Symmetric Integrators 


a = a + ^ha T K(a,0)a 
0 = 0-f- hid hM(cL, 0)cl . 


(4.1) 


Here, K = [Ki ,..., K m \ where each Ki(a, 0) is a symmetric m x m matrix, and 
M(a, 0) is an n x m matrix. The expression in the first equation is again to be 
interpreted as a T Ki(a, 6) a for the components i = 1,..., m. 

A necessary condition for the above map <P to be reversible with respect to the 
involution (a, 6) h* (a, — 0), cf. Definition V.1.2, is seen to be 


K(0,-0) = —K(0, 6 — hid) 
M(0, — 0) = Af(O,0-ha;). 


Consider now a perturbed map 

a = a + \ha T K(a,0)a + hsr(a,6) 

6 = 0 4- huo hM ( a , Q)a 4- h sp(a ) O') 


(4.3) 


where r and p , which like AT and M are assumed real-analytic, might depend ana¬ 
lytically also on h and 5. Reversibility of this map implies, by direct computation, 
that in addition to (4.2), the following equations are satisfied up to an error 0(he): 

r(0, — 0) = — r(O,0 — hid) 
dr dr 

= --M (4.4) 

p( 0, — 0) = p(0, 0 — /icj) — /iM(0, 0 — hid)r{ 0,0 — hid) . 


Similar to Sect. XI.2.3, we construct a reversibility-preserving near-identity trans¬ 
formation of coordinates (a, 0) i—> (6, <p) such that the above map in the new 
variables is of the form (4.3) with the perturbation terms reduced from 0(e) to 
0(e 2 ). Similar to Sect. X.6.1, this is possible if hid satisfies the diophantine condi¬ 
tion (X.6.3) and if the angular average M 0 of M( 0, •) has rank n. 

We look for the transformation in the form (2.15). The functions defining this 
transformation must satisfy the following equations, cf. (2.16): 


s(ip 4- hid) — s(ip) 
h 

a (ip 4- hid) — cr(ip) 
h 

Sjj(ip + hid) - Sij (ip) 
h 


r(0,<p) 

P(0,<p) + M(0,<p)s(<p) 

wd - E 

4 “ ^ ^ (p)Ki,kj ( 0 ; p) • 


(4.5) 


Under the conditions (X.6.3), (X.6.4) these equations can be solved by Fourier ex¬ 
pansion, in the same way as the analogous equations in Sections X.6.1 and XI.2.3, 
and the map in the variables (6, ip) becomes of the form 
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b = b+\hb T K(b,p)b + 0(he\\b\\ 2 ) + 0(he 2 ) 
p = p + huj + hM(b 7 p)b + 0(he\\b\\) + 0(he 2 ) . K ‘ 0) 

We still need to know that the change of variables (a, 0) i— » (6, ip) preserves re¬ 
versibility, i.e., that s and S are even functions of p and a is an odd function of 
p. This is indeed a consequence of (4.2) and (4.4). (We may modify r and p such 
that (4.4) holds exactly, at the expense of introducing additional 0(h 2 s 2 ) perturba¬ 
tions in (4.3).) Let us show this property for s. The Fourier coefficients Sk of s must 
satisfy 

pik-huj _ y 

T $k = p k • 

h 

Since (4.4) implies r_k = —r^e~ lk ' h ^ for all k, it follows that s_k = S&, and hence 
s is an even function of p. Similarly it is shown that S is even and a is odd. 

In summary, we have found a transformation 0(e) close to the identity, which 
transforms the reversible map (4.3) to a reversible map (4.6), thus reducing the per¬ 
turbation terms from 0(e) to 0(e 2 ). The iteration of this procedure can again be 
shown to be convergent. This finally yields a transformation to coordinates in terms 
of which the perturbed map is back in the form (2.13). In this way we obtain the fol¬ 
lowing discrete analogue of Theorem 2.2 or reversible analogue of Theorem X.6.1. 

Theorem 4.2. Consider a real-analytic reversible map of the form (4.3), de¬ 
fined on a neighbourhood of { 0} x T n , with 0 E M m . Suppose that huo satisfies the 
diophantine condition (X.6.3), and that the angular average of M( 0, •) has rank n. 
Then, there exists > 0 such that for every s with |^| < Sq, there is a real-analytic 

transformation : (b,p) i—» (a, 6), which preserves reversibility and is C(e) 

close to the identity uniformly in h satisfying (X.6.3) and is analytic in e, such that 
° $h,£ ° i>h,e : (b, v) i-> (b, ip) is again of the form (4.1): b = b + 0(\\b\\ 2 ), 
p = p-\-huj + 0( H&ll). The perturbed map <T>h,e therefore has an invariant torus on 
which it is conjugate to rotation by huo. □ 

As in the analogous situation of Sect. X.6.2, Theorem 4.2 applies directly, with 
5 = h p , to the situation where a reversible numerical method of order p is used 
to discretize an integrable reversible system, or more generally, a reversible sys¬ 
tem with a KAM torus with diophantine frequencies u. Here (4.1) corresponds to 
the tim e-h flow of the reversible system, and (4.3) represents the numerical map. 
This establishes the existence of invariant tori for reversible integrators, in perfect 
analogy to the symplectic counterpart Theorem X.6.2. 

Concerning condition (X.6.3) we refer back to Sect. X.6.3, where it is shown that 
this condition is satisfied for a Cantor set of step sizes h if u satisfies the diophantine 
condition (X.2.4). 


XI.5 Exercises 

1. This exercise shows that reversibility with respect to the particular involution 
(u, v) i—^ (u 7 —v ) is not as special as it might seem at first glance. 
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(a) If the system y = f(y ) is p-reversible (i.e., f(py) = —pf(y )), then the 
transformed system i = T~ 1 f(Tz ) is cr-reversible with cr = T -1 pT. 

(b) Every linear involution ( p 2 = I) is similar to a diagonal matrix with en¬ 
tries ±1. 

2. Consider the Toda lattice equations with an arbitrary number n of degrees of 
freedom and with periodic boundary conditions. 

(a) Find all linear involutions p for which the system is p-reversible. 

(b) Study for which p the eigenvalues of the matrix L are even functions of v. 

(c) Investigate (numerically) the set of initial values for which all the assump¬ 
tions of Theorem 1.3 are satisfied for some involution p. 

Hint. Generalize the discussion for n = 3 in the Example 1.6. 

3. A reversible system of the form 

a — 0 
9 = l j(a,9) 

with uj an even function of 6 G T n , also has a foliation of invariant tori. Con¬ 
sider reversible perturbations of such systems like in (2.1) and search for a 
reversibility-preserving transformation (2.3) that takes the perturbed system to 
the form 

b = 0{e 2 ) 

<p = Lo(b,p) + e/i(b,ip) + 0(e 2 ) 

with p even in ip. Write down the partial differential equations that the transfor¬ 
mation must satisfy and discuss (sufficient) conditions for their solvability. 

4. The torus {a = 0, 6 G T n } is invariant and carries a conditionally periodic 
flow with frequencies u for reversible systems of the form a = 0(||a||), 6 = 
uo + O (11 a 11), which is more general than (2.13) in the differential equation for a. 
Discuss the difficulties that arise in trying to transform a reversible perturbation 
of such a system back to this form. 

5. Apply an arbitrary (non-symmetric) Runge-Kutta method of even order p = 2 k 
to an integrable reversible system. Prove that under the assumptions of Theo¬ 
rem 3.1 the global error behaves for t = nh like 

Vn ~ y(t) = 0(th p ) + 0(t 2 h p+1 ), 

and the action variables like 


I{vn) - i(yo ) = 0(h p ) + o(th p+1 ). 



Chapter XII. 

Dissipatively Perturbed Hamiltonian 
and Reversible Systems 


Symplectic integrators also show a favourable long-time behaviour when they are 
applied to non-Hamiltonian perturbations of Hamiltonian systems. The same is true 
for symmetric methods applied to non-reversible perturbations of reversible sys¬ 
tems. In this chapter we study the behaviour of numerical integrators when they are 
applied to dissipative perturbations of integrable systems, where only one invariant 
torus persists under the perturbation and becomes weakly attractive. The simplest 
example of such a system is Van der Pol’s equation with small parameter, which has 
a single limit cycle in contrast to the infinitely many periodic orbits of the unper¬ 
turbed harmonic oscillator. 


XII. 1 Numerical Experiments with Van der Pol’s 
Equation 


One of the first such methods is the method of Van-der-Pol. [...] It should, 
however, be noted that in the formulation given by Van-der-Pol, approxi¬ 
mation was effected by simple intuitive reasonings. 

(N.N. Bogoliubov & Y.A. Mitropolski 1961, p. lOf.) 


Consider Van der Pol’s equation 

p = — q + e{\ — q 2 )p 

K J ( 1 . 1 ) 

q = p 

with small positive 5, which is a perturbation of the harmonic oscillator. A symplec¬ 
tic change to polar coordinates p = y/2a cos 6, q = \[2a sin 6 puts the system into 
the form 


a = 6 2a cos 2 0(1 — 2a sin 2 0) 

6 = 1 + 6 cos#sin0(1 — 2asin 2 0) . 

Since the angle 0 evolves much faster than a, we may expect that the averaged 
system , which replaces the right-hand side functions by their angular averages, gives 
a good approximation: 
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a = ea( 1 — \a) 

0 = 1 . 

Approximating by the averaged equation is the “method of Van-der-Pol” cited 
above, and the belief in the long-time validity of such an approximation is the aver¬ 
aging principle. The averaged differential equation for a has an unstable equilibrium 
at zero, and an asymptotically stable equilibrium at a* = 2. The averaged system 
therefore has the circle {a* = 2, 0 G M mod 2 tt} as an attractive limit cycle. This 
suggests that the original Van der Pol equation has a nearby limit cycle, which is 
indeed the case. 

Following the numerical experiment of Hairer & Lubich (1999), we solve the 
equation (1.1) with two initial values, (po,Qo) = (0,1-3) and (po,Qo) = (0,2.7), 
and with three numerical methods: the non-symplectic explicit and implicit Euler 
methods, and the symplectic Euler method. All of them have order 1. The numerical 
results are displayed in Fig. 1.1. For large step sizes (compared to the perturbation 
parameter er), the non-symplectic methods give a completely wrong numerical so¬ 
lution, whereas that of the symplectic method is qualitatively correct. For smaller 
step sizes, the numerical solutions of the non-symplectic methods also show a limit 
cycle. 

For the moment we explain these observations by “simple intuitive reasonings”, 
that is, by the averaging principle and formal backward error analysis. The rigorous 
treatment is developed in the course of this chapter in a more general framework of 
perturbed integrable systems. 

For a differential equation 


V = f(y ) + eg{y ), 

the numerical solution y n obtained by the explicit Euler method is the (formally) 
exact solution of a modified differential equation 

v = f(y) + eg(y) - \hf'(y)f(y) + 0(h 2 + eh). 

For the Van der Pol equation in the above coordinates, the averaged modified equa¬ 
tion becomes 

a = ha sa( 1 — + • • • 

which has approximately a = 2 + 2h/e as an equilibrium. Hence, the limit cy¬ 
cle of the numerical solution of the explicit Euler method has approximate radius 
2^1 + h/e (Fig. 1.1) which is far from the correct value unless h e. 

The implicit Euler discretization is adjoint to the explicit Euler method. There¬ 
fore, its modified differential equation is as above with h replaced by —h. In this 
case, the radius of the limit cycle is approximately 2^1 — h/e (for h < e ), which 
again agrees very well with the pictures of Fig. 1.1. 

For the symplectic Euler method, the modified differential equation for Van der 
Pol’s equation is 
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Fig. 1.1. Numerical experiments with Van der Pol’s equation (1.1), £ = 0.05 


p = -q + e(l - q 2 )p+ \hp + 0(h 2 + eh) 
q = p — \hq +0(h 2 + eh). 


Here, the modified differential equation for the unperturbed harmonic oscillator is 
Hamiltonian (Theorem IX.3.1), and so all e-independent terms in the averaged mod¬ 
ified equation vanish: 



dH j 

~d¥ 


(a, 0) d6 = 0. 


Therefore, the radius of the limit cycle is of size 2 + 0(h) in accordance with 
Fig.1.1. 
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XII.2 Averaging Transformations 

Le probleme des oscillations non lineaires a actuellement une grande 
importance dans les domaines les plus divers de la technique et de la 
physique. Parmi les methodes analytiques d’etude des oscillations non 
lineaires, la methode asymptotique de developpement en serie par rap¬ 
port a un parametre petit est particulierement efficace. Toute une serie 
de monographies publiees en 1930-1938 par N. Krylov et N. Bogo- 
lioubov tant en russe qu’en fran£ais ont ete consacrees a cette ques¬ 
tion, malheureusement ces ouvrages sont devenus aujourd’hui des raretes 
bibliographiques. Par ailleurs les methodes exposees ont ete largement 
developpees depuis. 

(N. Bogolioubov & I. Mitropolski 1962, preface a la traduction frangaise) 

In this section we consider rather general perturbations of integrable systems. We 
study transformations that eliminate the dependence on the angles in the pertur¬ 
bation functions, up to arbitrary powers of the small perturbation parameter. The 
construction and properties of these “averaging ” transformations are obtained by a 
slight extension of the arguments in Sections X.2 and XI.2. 

XII.2.1 The Basic Scheme of Averaging 

As in Sections X.2.1 and XI.2.1, we consider perturbations of an integrable system 
written in action-angle variables: 

a = er(a.O) 

( 2 . 1 ) 

0 = cu(a) + ep(a, 0) 

where £ is a small parameter and r, p are real-analytic in a neighbourhood of {a* } x 
T d . Unlike the situation of the previous chapters, we do not impose conditions that 
make the angular average 


r(a) = 



( 2 . 2 ) 


vanish identically. We look for a transformation to new variables (b,(p), of the form 


a = b + ss(b, ip) 

0 = ip + E(j{b , (p) , 


(2.3) 


which eliminates the dependence on the angles in the 0(e) terms of (2.1): 

b = £m(b) + 0(s 2 ) 

0 (2.4) 

(p = v(b) + ep(b) + 0(s 2 ) . 

This is just a minor modification of the problem in Sect. XI.2.1. The equations that 
s and a must satisfy, differ from (XI.2.5) and (XI.2.6) only in that the right-hand 
side r(b, ip) of (XI.2.5) is replaced by r(b, ip) — m(b ), viz., 
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— (b,<p)u(b) = r(b, <p) — m(b) (2.5) 

^(b,tp)u(b) = p{b,<p) + u'(b) s(b,<p) - /i(b) . (2.6) 

Necessary conditions for solvability are now 

m(b) = r(b) , p(b) = p{b) , (2.7) 

where the second equation corresponds to the choice s(b) = 0. In other words, the 
leading terms in (2.4) are the angular averages of the perturbations in (2.1). 

The equations (2.5), (2.6) are solvable for b = b* if u;(b*) satisfies the dio- 
phantine condition (X.2.4). The “ultraviolet cutoff” argument of the proof of Lem¬ 
ma X.2.1 then shows that (2.4) holds uniformly as long as the solution remains in 

the ball ||6 — b* || < c\ log e \ _zy_1 , with a sufficiently small constant c. This may hold 

over a very long time interval if the equation b = em(b) has a stable equilibrium in 
that ball. 


XII.2.2 Perturbation Series 


As in Sections X.2.2 and XI.2.2, the above construction extends to arbitrary finite 
order in e. A transformation of the form (XI.2.9), which eliminates the angles in all 
terms up to order £ Ar_1 , is sought for: 

b = emxib) + s 2 m 2 (b) + ... + s N ~ l m N - 1 (b) +s N r N (b,(p) 

(p = u(b) + £fii(b) + e 2 [ 12 (b) + • • • + e N_1 / 2 N-i{b) + e N p^{b^ cp) . 

The equations determining the transformation are a slight modification of (XI.2.11) 
and (XI.2.12): on the right-hand side of (XI.2.11), pj ( b , p) is replaced by the differ¬ 
ence pj(b , p ) — rrij{b ), with rrij(b) = Pj(b). We then have the following variant of 
Lemmas X.2.1 and XI.2.1. 


Lemma 2.1. Let the right-hand side functions of (2.1) be real-analytic in a neigh¬ 
bourhood x T d . Suppose that uj(b*) satisfies the diophantine condition 

(X.2.4) with exponent v. For any fixed N > 2, there are positive constants Sq,c,C 
such that the following holds for \e\ < so: there exists a real-analytic change of 
coordinates (a, 6) 1 —► ( 6 , <p) which transforms (2.1) to (2.8) with 


||r-jv( 6 ,^)|| < C/5 N ~ 1 , |W( 6 ,^)|| < C/S"- 1 


for ||6 - 6 *|| <6, 


where 

6 = c\ log 5| _zy_1 . (2.9) 

Moreover, the transformation is 0(s)-close to the identity: || (a, 0) — ( b , ip)\\ < Ce 
holds for (a, 0) and ( 6 , <p) related by the above coordinate transform, for \\b — b* || < 
S and for p in an s-independent complex neighbourhood of T d . 

The constants £q, c, C depend on N, d , 7 , v and on bounds ofcc, r, p on a com¬ 
plex neighbourhood of {b*} x T d . 
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Proof. The proof uses again the ultraviolet cutoff argument of the proof of Lem¬ 
ma X.2.1. This makes all the functions , (J {, mi , pi real-analytic in b for 11 b — 6*|| < 
26 and of ip in an ^-independent complex neighbourhood of T d . The powers of S in 
the denominators of the estimates come from the presence of terms dsj / db , daj /db 
in pi(b,p) and 7Ti(b,p) of (XI.2.11) and (XI.2.12) and from Cauchy’s estimates 
applied to sj, oj on ||b — 6* || < 26. □ 


XII.3 Attractive Invariant Manifolds 

Theorems on invariant manifolds for maps have been proved many times 
for many different settings. The first results were obtained by Hadamard 
(1901) and Perron (1929). [...] Our aim was to derive a global invariant 
manifold result with conditions that are easy to verify for the applications 
in mind. (K. Nipp & D. Stoffer 1992) 

In this section we give results on the existence and properties of attractive invari¬ 
ant manifolds of maps, with a very explicit handling of constants. These results are 
due to Kirchgraber, Lasagni, Nipp & Stoffer (1991) and Nipp & Stoffer (1992). 
They will allow us to understand the weakly attractive closed curves that we ob¬ 
served in Sect. XII. 1. Beyond that particular example, these results are extremely 
useful for studying the long-time behaviour of numerical discretizations in a great 
variety of applications; see Nipp & Stoffer (1995, 1996) and Lubich (2001) and 
references therein, and also Stuart & Humphries (1996) for a related invariant man¬ 
ifold theorem and its use in analyzing the dynamics of numerical integrators for 
non-conservative problems. 

Consider a map <L : X x Y —> X x Y defined on the Cartesian product of a 
Banach space X and a closed bounded subset Y of another Banach space. We write 

&{x, y) = (x, y) with 


x = x + f(x, y) 

; ! (3.i) 

y = g{x,y) • 

We assume that / and g are Lipschitz bounded, with Lipschitz constants L xx , L xy 
and L yx , L yy with respect to x, y. If these Lipschitz constants are sufficiently small, 
then the map has an attractive invariant manifold. More precisely, there is the fol¬ 
lowing result, stated without proof by Kirchgraber, Lasagni, Nipp & Stoffer (1991) 
and proved in a more general setting by Nipp & Stoffer (1992). 


Theorem 3.1. In the above situation, if 


Lxx H - L yy T 2yjL xy L yx 1 


(3.2) 


then there exists a function s : X —> Y, which is Lipschitz bounded with the constant 
/\ — 2L yx ^(l L xx L yy ^y such that 

Xi = {(x, s(x)) : x G X} is invariant under <L>. 
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A4 attracts orbits of <P with the attractivity factor p = A L xy + L yy < 1 , that is, 
\\y — s(x) || < p || y — s(x) || holds for all (x, y) G X xY. 

Proof (a) We search for a function s : X —> Y such that for (x, y) = @(x, y), the 
relation y = s{x) implies also y = s(x). For an arbitrary function a : X —> Y, 
we first study which relation holds between x and y if y = a(x).To write y as 
a function of x, we need a bijective correspondence between x and x via the first 
equation of (3.1). By the Banach fixed-point theorem, the equation 


x = x + f(x, cr(x)) has a unique solution x = u a (x) 

for every x G X if x i—► f(x, cr(x)) is a contraction. This is the case if a has the 
Lipschitz constant A and 

L X x + L xy A < 1 . (3.3) 

We then obtain y = a(x) from the following scheme: 

x = u a (x) <— x 

a 

y = cr(x) > y = g(x, y) 


That is, we set y = a(x) = g(u a (x), cr(it a (x))). By construction, (x,y) =<P(x,y). 

Under condition (3.3), the function u a : X —> X is Lipschitz bounded by p = 
1/(1 — L xx — L xy A). Consequently, the function a : X —> Y is Lipschitz bounded 
by (L yx + L yy \)p. The condition that the transformed function a is again Lipschitz 
bounded by the same A as cr, therefore reads 

X +L A (3 . 4) 

1 -L'xx -L/xyA 

or equivalently, 

(1 L X x I J yy ^)\ T - L yx ^ 0 . 

Under condition (3.2), there exists a non-empty real interval of values A satisfying 
this quadratic inequality. In particular, (3.4) then holds for 


A = 


2 L 


yx 


J yy 


(3.5) 


(This is close to the smallest possible value of A if 2 ^/L xy L yx < 1- L xx — L yy .) 
It is easily checked that (3.2) and (3.5) imply (3.3). 

Under conditions (3.3) and (3.4), the transformation H : cr ^ <r, which is called 
a Hadamard graph transform , maps the set of functions 


S = {cr : X —» Y \ cris Lipschitz bounded by A} 


into itself, i.e., 
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H :S^S:cr^a. 

S is a closed subset of C(X, Y), the Banach space of continuous functions from 
X to the bounded closed set Y, equipped with the supremum norm ||cr||oo = 
sup xeX ||(t(.t)|J ? If H is a contraction, then the Banach fixed-point theorem tells 
us that there is a unique function s G S with s' = s. By construction, this 
means that if (x,y) = <£>(x,y) and y = s(x), then also y = s(x). The graph 
M = {(x, s(x)) : x e X} is then an invariant manifold for the map 

(b) We now show that H is already a contraction under condition (3.2). Let 
do, cr\ be two arbitrary functions in S, and xGl. With X{ = u a . (x), 

\\Hai(x) - Ha 0 (x)\\ = \\g(x±,a^xi)) - g(x 0 , cr 0 (ar 0 ))|| 

< \\g(x 1 ,ai(x 1 )) -ff(a:i,<To(a:i))|| + || 5 (xi,a 0 (xi)) - g(x 0 , cr 0 (^o))|| 

— Lyy ll^l ^o||oO ( Lyx "f* LyyX) ||aq *o|| • 

By definition, x = Xi + f(xi, &i(xi )) for i = 0,1. Subtracting these two equations 
yields similarly 

ll^i — tro|| < \\f(xi,cri (aq)) - f(x 0 ,cr 0 (x 0 ))|| 

< ||/(a?i,<ri(ari)) - f(xi,a 0 (xi))\\ + [\f(x 1 ,a 0 (x 1 )) - f(x 0 ,a 0 (x 0 ))|| 

^ Lxy | |^"l CT 0 1| oo Y ( L xx + L xy X) ||a?i — atoll - 

Hence, 

Iki - x o|| < vzrrYi —t H 0 ' 1 _ ^olloo • 

Combining both inequalities and recalling (3.4), we obtain 

WH^-HaoW oo ^ (-Lyy Y ^^xy) ||rr|- (Jo Hoq . 

Since the inequality 

-Lyy Y A L X y <C 1 (3.6) 

is satisfied by the A of (3.5) under condition (3.2), H is indeed a contraction. 

(c) It remains to show that the invariant manifold M is attractive. With (x, y) = 
<£>(x,y), we write 

y - s ( x ) = y ) - s ( x + f ( x , y)) 

= (g(x, y ) - g(x, s(x))^j + (s(x + f(x, s(x))) - s(x + f(x, y))J . 

Here we used the identity 

s(x + f(x, s(x))) = s(x + f(x, s(x))) = g(x, s(x)) , 

which holds because s' = s and by construction of the Hadamard transform. It 
follows that 

V ~ • S '(J)|i < {Lyy + A L X y) ||j/ ~ S(x) || , 

which together with (3.6) yields the result. □ 
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Next we study the effect of a perturbation of the map on the invariant manifold. 

Theorem 3.2. Consider maps <L>o, T>i : both of which satisfy the 

conditions of Theorem 3.1 with the same Lipschitz constants L xx , L xy , L yx , L yy . 
Let sq and s i be the functions defining the attractive invariant manifolds Ado and 
Adi, respectively. If the bound 

\Wi{x,y)-^o{x,y)\\ < 8 for (x,y) £ M 0 

holds in the norm ||(x, y) || = A \\x\\ + \\y\\ on X xY, then 

II Si (x) - So (*) II < YZTp f or xeX - 
(Here X and p are defined as in Theorem 3.1.) 

Proof. The proof is similar to part (b) of the previous proof. Let x G X. For i = 
0,1, we have Si(x) = gfixi, sfixf)) with Xi defined by the equation x = Xi + 
fi{xi , Si(x{)). We estimate 

||si(*)-■s 0 (*)|| < ||5 i(xj,si(xi))- 5i(xi,s 0 (^i))|| 

+ ||5i(^i,so(^i)) -Si(*o,so(*o))|| 

+ ||si(*o,so(*o)) — 5o(*o, s o(^o))|| 

— ll S l ^olloo ( Lyx 4" LyyX ) ||*1 *o|| 

+ ||gi(*o> so(*o)) -5o(*o,s 0 (*o))|| 

and in the same way 

11*1 — ^0 || < ||/l(*l,Sl(*l)) - /l(*l,S 0 (*l))|| 

+ ||/l(*l,S 0 (*l)) - /i(*o,s 0 (*o))|| 

+ ||/i(*o,s 0 (*o)) - /o(*o,so(*o))H 

^ -^xyll^l Soil oo d“ (L xx + L X y\) ||*1 - * 0 || 

+ ||/l(*0,So(*o)) - /o(*o,s 0 (*o))|| • 

Inserting the second bound into the first one and using (3.4) and the assumed bound 
on <Pi — <To gives 

||^l ^o||oO ^ (-Lyy + XL xy ) ||5l 5o||oo + £, 


which implies the result. 


□ 
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XII.4 Weakly Attractive Invariant Tori of Perturbed 
Integrable Systems 

We assume that the perturbation is dissipative such that one torus persists 
under the perturbation and gets attractive. 

Our analysis is done by the method of averaging. The problem of this sec¬ 
tion is classical, see e.g. Bogoliubov & Mitropolski (1961), Kirchgraber 
& Stiefel (1978). (D. Stoffer 1998) 

In the example of the Van der Pol equation, we have seen that only one of the peri¬ 
odic orbits of the harmonic oscillator persists under the small nonlinear perturbation 
and becomes an attractive limit cycle. More generally, we consider perturbations of 
integrable systems 

a = sr(a.Q) 

(4.1) 

9 = w(a) + e p(a, 6) 

where (locally) just one invariant torus survives the perturbation and attracts nearby 
solutions. Using the results of the two previous sections, it will be shown that this 
situation occurs if, at some point a* where the frequencies cji(a*) are diophantine, 
the angular average f(a*) is small and its Jacobian matrix 

A = r'(a*) 

has all eigenvalues with negative real part. 

The following theorem is a slight modification of a result of Stoffer (1998). Early 
versions of it are much older; see the citations above. The origins of the problem can 
be traced back to the work of Van der Pol (1927) and Krylov & Bogoliubov (1934). 

Here we assume the following: c u(a*) satisfies the diophantine condition (X.2.4) 
with exponent v. The perturbation functions r(a, 6) and p(a, 0) are real-analytic 
on a fixed complex neighbourhood of {a*} x T d and bounded independently of 6 
(though they may depend on e). In some norm || • || on l d and its induced matrix 
norm, the bounds 


||r(a*)|| < C\ loge| 2(! ' +1) (4.2) 

\\e tA \\ < e~ ta for t> 0 (4.3) 

hold with some constants C and a > 0. 

Theorem 4.1. Under the above conditions, for sufficiently small e > 0, the system 
(4.1) has an invariant torus % which attracts an 0( \ log s\~ v -^-neighbourhood of 
{a*} x with an exponential rate proportional to e. 

Proof The proof combines Lemma 2.1 and Theorem 3.1. For convenience we as¬ 
sume a* = 0 in the following. Lemma 2.1 (with N = 3) gives us a change of 
coordinates (a, 6) i— > (6, cp), 0(£)-close to the identity, such that for ||b|| < S with 
S = c\ log£| -zy_1 of (2.9), 
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b = emi(b) + e 2 m 2 (6) + 0(e 3 /5 2 ) 

<p = u>(b) + £/xi(b) + e 2 122 (b) + 0(s 3 /S 2 ) . 

Since mi (6) = f(b) = Ab + 0(6 2 ) by (4.2), this system is of the fonn 

b = eAb + 0(eS 2 ) 
p = Lo(b) + 0(e). 

Similarly, the corresponding variational equation is of the form 

(B\ _ (eA + 0(e5) 0 (s 3 /5 2 )\(B\ 

\*J V 0(1) 0(e 3 /6 2 )J (<PJ - 

These relations and condition (4.3) imply that, for sufficiently small 5 and for any 
fixed r > 0, the time-r flow of (4.1) maps the strip D = {(b,ip) : ||6|| < \8,p G 
T d } into itself, and the following bounds hold for the derivatives of the solution with 
respect to the initial values: 



db{r) 

06(0) 

^ t _ —TEOi , db(r) 

<L bb -e + 0(eS) , ^ (Q) 

< L bv = 0(e 3 /S 2 ) 


dp{r) 
db( 0 ) 

< 5 = 0 ( 1 ) , 

dp{r) 

dm 

< L w = 0{e z /5 2 ) 

Hence, for sufficiently small 6, 




L(p(p + L 55 + 2 yj < e r£Q: / 2 < 1 . 


Theorem 3.1 (and Exercise 1) used with <p, b in the roles of x, y now shows that the 
time-r flow has an attractive invariant torus {(s((p),(p) : ip G T d }, where s : T d ^ 
{||6|| < is Lipschitz bounded by A = 2L} )(p /(l — — L m) = 0(s 3 /S 2 ). 

This invariant torus attracts orbits of the time-r flow map in the strip D with the 
attractivity factor A+ Lm < e _rea / 2 . As Exercise 2 shows, the torus is actually 
invariant for the differential equation (4.1). □ 


XII.5 Weakly Attractive Invariant Tori of Numerical 
Integrators 

Does the attractive invariant torus of Theorem 4.1 persist under numerical discretiza¬ 
tion of the perturbed integrable system? This question was first studied by Stoffer 
(1998) who worked directly with the discrete equations in his analysis. Here we take 
up the approach of Hairer & Lubich (1999) where the problem was studied by com¬ 
bining backward error analysis and perturbation theory, similar to what was done in 
the two preceding chapters. 
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XII.5.1 Modified Equations of Perturbed Differential Equations 

Below we need to use backward error analysis for the numerical solution of a per¬ 
turbed differential equation 

y~f(y) + £g(y,s), 2/(0) = 2/0 ( 5 . 1 ) 

with real-analytic functions / and g and small parameter 5 . We consider applying a 
one-step method y\ = <& £ h (yo) of order p > 1 with step size h > 0. The associated 
modified differential equations constructed in Chap. IX are then of the form 

v = f(y) + £ g(y, £ ), 2 /( 0 ) = yo (5.2) 

with suitably truncated series 

f(y) = f(y) + h p f p+ i{y) +■■■ + h N ~ 1 f N (y) 
g(y,e) = g{y,e) + h p g p+1 (y,e) + ... + h N ~ 1 g N (y,e) , (5.3) 

where the functions fj are independent of s,h,N, whereas the functions gj are 
allowed to depend on 5 . The following adapts Theorem IX.7.6 to the above situation. 

Theorem 5.1. Let f(y) + eg (y, e) be real-analytic (in y and e) and bounded by M 
for y G L> 2 R(yo) and for all complex e with \e\ < £q. Let the coefficients of the 
Taylor series (in h) of the numerical method be analytic in Br^o) with bounds 
(IX.7.5) for \e\ < £ 0 . Then, there exists ho > 0 (proportional to R/M), such that 
for h < ho/4: and for N = N(h) the largest integer with hN < ho, the difference 
between the numerical solution y\ = ^( 2 / 0 ) an d the exact solution t (y 0 ) of the 
truncated modified equation (5.2)-(5.3) satisfies 

\\®Uvo) - &N,h(yo)\\ < Che- h °'\ 

The functions f and g of (5.3) are real-analytic in Bffiyo) with 

11/(2/) - f(y )II < Ch p , || g(y, e) - g(y, e)|| < Ch p 

for y G B R / 2 (yo) and \e\ < £q. The constants C are independent ofh < ho/4 and 

kl < e 0 . 

Proof. The exponentially small estimate for y 0 ) — h (Vo) is that of Theo¬ 

rem IX.7.6 applied to the differential equation (5.1). The 0(h p ) bound for f(y) — 
f(y) is the estimate (IX.7.14) applied to y = f(y). By applying that estimate to 
(5.1), a bound of the same type is obtained for ( f(y) + £g{y, s)) — ( f(y ) + sg(y)), 
uniformly for all complex e in the complex disk \e\ < £q- F° r an Y fixed y G 
Br/ 2 ( 2 / 0 )* the difference 

g(y, e) - g(y , £ ) = ~ {[(fiv) + £ g( 2/> £ )) - (f(y) + £ g(v , £ ))] - lf(y) ~ /(?/)]) 

is an analytic function of 6 in the complex disk |er| < £q, which is bounded by 0(h p ) 
for |6| = Sq. By the maximum principle, the same bound then holds for \e\ < £q. □ 
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XII.5.2 Symplectic Methods 

We apply a symplectic integrator with step size h to a real-analytic perturbed inte¬ 
grate Hamiltonian system in coordinates (p, q). 


P 

Q 


OH 

dq 


(p, q) +efe(p, q) 


dH 

dp 


(p, q) + e£(p, q) . 


(5.4) 


We assume that the unperturbed system (e = 0) is a completely integrable sys¬ 
tem which satisfies the conditions of the Arnold-Liouville theorem, Theorem X.1.6. 
Hence, there exists a transformation to action-angle variables for the 


integrable system: (p, q) i—► (a, 6) by Theorem X.1.6. 


This change of coordinates transforms the integrable system to the equations a = 0, 
6 = u(a), and it transforms (5.4) to a system (4.1), for which we assume (4.2), (4.3) 
and the diophantine condition (X.2.4) with exponent v for a;(a*). The following 
theorem is a variant of results in Stoffer (1998) and Hairer & Lubich (1999). It 
shows that for symplectic methods, the invariant torus persists under a very mild 
restriction on the step size. For non-symplectic methods, this would require step 
sizes h with h p « 5 (see Exercise 5). 


Theorem 5.2. Let a symplectic numerical integrator of order p be applied to a per¬ 
turbed integrable Hamiltonian system (5.4) which satisfies the conditions stated 
above. Then, there exist £o > 0 and Co > 0 such that, for 0 < £ < £o and for 
step sizes h > 0 satisfying 

h p < c 0 | logs\~ K (5.5) 

with n = max(z/ + d + l,p), the numerical method has an attractive invariant 
torus T £} h. This torus is 0(h p ) close to the invariant torus % of (5.4). It attracts an 
0( | log £\~ 2k ) neighbourhood with an exponential rate proportional to e, uniformly 
in h. 


Remark 5.3. The exponent v+d- hi comes from Lemma X.4.1. It could be reduced 
to v + 1 by using Riissmann’s estimates in place of that lemma; cf. the remark after 
Lemma X.4.1. 


Proof of Theorem 5.2. The proof combines backward error analysis (Theorem 
IX.3.1 and Theorem 5.1), perturbation theory (Theorem X.4.4 and Lemma 2.1), 
and the invariant manifold theorem (Theorem 3.1). 

(a) We begin by considering the symplectic method applied to the integrable 
Hamiltonian system (5.4) with e = 0. This leads us back to the questions of Chap. X. 
We use backward error analysis and recall (Theorem IX.3.1) that the modified equa¬ 
tion is again Hamiltonian and an 0(h p ) perturbation of the integrable system, both 
in the (p, q) and the (a, 6) variables. We transform variables for the 
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modified equation of the integrable system: (a, 0) i— » (a, 0) by Theorem X.4.4, 


with h p in the role of the perturbation parameter. By (X.4.1) with N proportional to 
| log^|, and by condition (5.5) with a sufficiently small co, the modified equations 
in these variables become 


a = 0(e 3 ) 

0 = u(d) + 0(s 3 ) 


for \\a — a* || < c*| log £r| , 


with oo(a) = c o(a) + 0(h p ). Moreover, the transformation (a, 0) i—» (a, 0) is 
0(h p ) close to the identity. 

(b) The modified equations of the perturbed system, written in the (a, 6) vari¬ 
ables, become 


a = sr(a, 6) + 0(e 3 ) 

6 = oo( cl) 4 - £p(a : 0 ) H- 0(s 3 ) 


for \\a — a* || < c*| log £r| 2/ % (5.6) 


where f(a, 0) = r(a, 6) + 0(h p ) and p(5,0) = p(a, 0) + 0(h p ) by Theorem 5.1. 
Consider now these equations with the 0(e 3 ) terms dropped. We change variables 
for the 

modified equation of the perturbed system: (a, 0) i—> (6, p) by Lemma 2.1. 

(Note Exercise 4 with cu(a*) = o;(a*) + G(h p ) and (5.5).) The system (5.6) is 
transformed to the form of (4.4), 

b = erh(b) + 0(e 3 /5 2 ) ^ ^ 

^ = S(b) + sp(b) + 0(e 3 /S 2 ) 

with^ = c*| log5| _2/ ^, and wherem(b) =r(b)+0(e/8) = r(b)-\-0(h p )+0(s/5), 

and also the Jacobian of fh at a* is close to that of r, so that it satisfies again (4.3), 
at least with a replaced by a/2. In the same way as in the proof of Theorem 4.1 and 
with the same Lipschitz constants as in (4.5), we now obtain an attractive invariant 
torus of the modified equation of the perturbed system. The time-/i flow of this 
equation is an exponentially small (in 1/h) Lipschitz perturbation of the numerical 
one-step map, so that under condition (5.5) it is an 0(e 3 ) perturbation. Therefore, 
Theorem 3.1 yields an invariant torus T e ^ of the numerical method. 

(c) It remains to bound the distance between the tori and %. We recall that 
% was obtained by a transformation of the 

perturbed system: (a, 6) i—» (6, ip) by Lemma 2.1, 

which puts (4.1) into the form (4.4). We thus have the transformations 
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(a, 9) —» (b,tp) 

h p 

(a, 9) —* (b, (p) 

where the symbols h p and 5 indicate that the transformation is 0{h p ) or 0(e) close 
to the identity. By the construction of Lemma 2.1, the composed transformation 
(6, p) i—^ (6, p) is (D(h p ) close to the identity and moreover, the right-hand sides of 
(4.4) and (5.7) differ by 0(sh p ). Theorem 3.2 (with p = e - £ W 2 ) now shows that 
the functions s £ ^ and s £ defining T £ ^ and T e , respectively, differ by 0(h p ). This 
yields the desired distance bound. □ 

XII.5.3 Symmetric Methods 

A result analogous to the theorem of the previous subsection holds for reversible 
methods applied to perturbed reversible systems 

u = f(u,v)+ek(u,v) 
v = g(u, v) + e£(u, v) 

where the unperturbed system (e = 0) is a real-analytic integrable reversible system. 
If the perturbed system, written in action-angle variables of the unperturbed system, 
satisfies the conditions of Theorem 4.1, then a reversible analogue of Theorem 5.2 
holds, where the terms “symplectic” and “Hamiltonian” are simply replaced by “re¬ 
versible”. The proof remains the same, working with the reversible analogues of the 
results used for the Hamiltonian case. 


XII.6 Exercises 

1. In the situation of the invariant manifold theorem, Theorem 3.1, suppose in 
addition that / and g are a-periodic in x: f(x + a, y) = f(x, y), g(x + a, y) = 
g(x,y) for all x G X, y e Y. Show that in this case the function s defining the 
invariant manifold is also ^-periodic. 

Hint. The Hadamard transform maps a-periodic functions to ^-periodic func¬ 
tions. 

2. Show that if the time-r flow map ^ = p T of a differential equation has an 
attractive invariant manifold M, and if the flow p t maps a domain of attractivity 
of M under ^ into itself for every real t , then M is also invariant under the flow 
p t for every real t. 

Hint. Write p t = <& n o p t o <3>~ n and use the attractivity of M for n —► oo. 
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3. Prove that in the situation of Theorem 3.1, iterates (x n+ i, y n +i) = @(x n , y n ) 
have the property of asymptotic phase (Nipp & Stoffer 1992): there exists a se¬ 
quence (x n , y n ) of iterates on the invariant manifold, i.e., with (x n+ i, y n +i) = 
^{x n , y n ) and y n = s(x n ), such that for all n > 0, 

II x n x n || 5^ c || y n s(xn) II 

II y n - Vn\\ < (1 + Ac) ||y„ - s(a; n )|| , 

where c = A/(l — AA*) with A = 2L yx /(l — L xx — L yy ) of (3.5) and 
A = 2L X y/(l L xx Lyy). Note that \\y n ^(^n)|| — P \\vo ^(^o)|| 

by Theorem 3.1. 

Hint. Consider the sequences {fc^\y^) defined by x^ = Xk, y^ = s(xk) 
and = @(x^\y£^) for n = k — 1,...,1,0. Show that, for 

fixed n, the sequence (x^) (k > n) is a Cauchy sequence. 

4. Show that Lemma 2.1 holds unchanged if the diophantine condition (X.2.4) for 
cj(a*) is weakened to ca (a*) = cu* + 0(S 2 ) with cj* satisfying (X.2.4). 

5. In the situation of Theorem 5.2, show that every numerical integrator of order p 
has an attractive invariant torus if h p <C e. This torus is 0(h p /e) close to the 
invariant torus of the continuous system. 



Chapter XIII. 

Oscillatory Differential Equations 
with Constant High Frequencies 


This chapter deals with numerical methods for second-order differential equations 
with oscillatory solutions. These methods are designed to require a new complete 
function evaluation only after a time step over one or many periods of the fastest os¬ 
cillations in the system. Various such methods have been proposed in the literature - 
some of them decades ago, some very recently, motivated by problems from mole¬ 
cular dynamics, astrophysics and nonlinear wave equations. For these methods it is 
not obvious what implications geometric properties like symplecticity or reversibil¬ 
ity have on the long-time behaviour, e.g., on energy conservation. The backward 
error analysis of Chap. IX, which was the backbone of the results of the three pre¬ 
ceding chapters, is no longer applicable when the product of the step size with the 
highest frequency is not small, which is the situation of interest here. The “exponen¬ 
tially small” remainder terms are now only 0(1) \ For differential equations where 
the high frequencies of the oscillations remain nearly constant along the solution, 
a substitute for the backward error analysis of Chap. IX is given by the modulated 
Fourier expansions of the exact and the numerical solutions. Among other proper¬ 
ties, they permit us to understand the numerical long-time conservation of the total 
and oscillatory energies (or the failure of conserving energy in certain cases). It turns 
out, symmetry of the methods is still essential, but symplecticity plays no role in the 
analysis and in the numerical experiments, and new conditions of an apparently 
non-geometric nature come into play. 


XIII. 1 Towards Longer Time Steps in Solving 
Oscillatory Equations of Motion 

Dynamical systems with multiple time scales pose a major problem in 
simulations because the small time steps required for stable integration 
of the fast motions lead to large numbers of time steps required for the 
observation of slow degrees of freedom and thus to the need to compute 
a large number of forces. 

(M. Tuckerman, B.J. Berne & G.J. Martyna 1992) 

We describe numerical methods that have been proposed for solving highly os¬ 
cillatory second-order differential equations with fewer force evaluations than are 
needed by standard integrators like the Stormer-Verlet method. We present the ideas 
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underlying the construction of the methods and leave numerical comparisons to 
Sect. XIII.2 and the analysis of the methods to Sections XIII.3-XIII.6. We consider 
only methods that are symmetric or symplectic. The presentation in this section 
follows roughly the chronological order. 

XIII. 1.1 The Stormer-Verlet Method vs. Multiple Time Scales 

Perhaps the most widely used method of integrating the equations of mo¬ 
tion is that initially adopted by Verlet (1967) and attributed to Stormer. 

(M.P. Allen & D.J. Tildesley 1987, p. 78) 

The Newtonian equations of motion of particle systems (in molecular dynamics, 
astrophysics and elsewhere) are second-order differential equations 

q = -W (q). (1.1) 

To simplify the presentation, we omit the positive definite mass matrix M which 
would usually multiply q. This entails no loss of generality, since a transformation 
q —> M x ! 2 q and V(q) —> V(M~ 1 / 2 q) gives the very form (1.1). 

The standard numerical integrator of molecular dynamics is the Stormer-Verlet 
scheme; see Chap. I. We recall that this method computes the new positions q n+ 1 at 
time t n + 1 from 

Qn +1 2 q n H- qn—i = h fn (1-2) 

with the force f n = — W (q n ). Velocity approximations are given by 

_ Qn-\- 1 Qn— 1 

qn ~ 2 h ' 

In its one-step formulation (see (1.1.17)) the method reads 1 

Pn+ 1/2 = Pn + \hf n 

Vn+l = qn + hp n + 1/2 (1.3) 

Pn +1 = Pn+ 1/2 + \hf n +i . 

We recall that this is a symmetric and symplectic method of order 2. For linear 
stability, i.e., for bounded error propagation in linearized equations, the step size 
must be restricted to 

huj <2 

where cj is the largest eigenfrequency (i.e., square root of an eigenvalue) of the 
Hessian matrix W 2 V(q) along the numerical solution; see Sect. 1.5.1. Good energy 
conservation requires an even stronger restriction on the step size. Values of hu; « \ 
are frequently used in molecular dynamics simulations. 

The potential V(q) is often a sum of potentials that act on different time scales, 


1 We write p when the Hamiltonian structure and symplecticity are an issue, and q otherwise. 
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V(q) = W ( q ) + U(q) with \7 2 W(q) positive semi-definite and 

\\V 2 W(q)\\ » ||V 2 f/(?)ll . ° A> 

In this situation, solutions are in general highly oscillatory on the slow time scale 

T ~ l/l|V 2 V(g)|| 1 / 2 . 

In particular when the fast forces —VW(q) are cheaper to evaluate than the 
slow forces —\7U(q), it is of interest to devise methods where the required number 
of slow-force evaluations is not (or not severely) affected by the presence of the 
fast forces which are responsible for the oscillatory behaviour and which restrict 
the step size of standard integrators like the Stormer-Verlet scheme. This situation 
occurs in molecular dynamics, where W(q) corresponds to short-range molecular 
bonds, whereas U(q) includes inter alia long-range electrostatic potentials. 

In some approaches to this computational problem, the differential model is 
modified: highly oscillatory components are replaced by constraints (Ryckaert, Cic- 
cotti & Berendsen 1977), or stochastic and dissipative terms are added to the model 
(see Schlick 1999). Such modifications may prove highly successful in some appli¬ 
cations. In the following, however, we restrict our attention to methods which aim 
at long time steps directly for the problem (1.1) with (1.4). 

Spatial semi-discretizations of nonlinear wave equations, such as the sine- 
Gordon equation 

un = u xx - sin u, 

form another important class of equations (1.1) with (1.4). Here W(q) = \q r Aq, 
where A is the discretization matrix of the differential operator —d 2 /dx 2 . 

XIII. 1.2 Gautschi’s and Deuflhard’s Trigonometric Methods 

It is anticipated that trigonometric methods can be applied, with simi¬ 
lar success, also to nonlinear differential equations describing oscillation 
phenomena. (W. Gautschi 1961) 

The oldest methods allowing the use of long time steps in oscillatory problems con¬ 
cern the particular case of a quadratic potential W(q) = \u 2 q r q with u >> 1, for 
which the equations take the form 

<i=-u 2 q + g{q). (1.5) 

For such equations, Gautschi (1961) proposed a number of methods of multistep 
type which are constructed to be exact if the solution is a trigonometric polynomial 
in ut of a prescribed degree. The simplest of these methods (and the only symmetric 
one) reads 

Qn+i ~ 2 q n + q n - 1 = h 2 sine 2 (|M q n , (1.6) 

where sinc£ = sin£/£ and q n = —co 2 q n + g n with g n = g(q n ), or equivalently 

q n + 1 -2cos{hto)q n + q n -i = h 2 smc 2 {\hto) g n . (1.7) 
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The method gives the exact solution for equations (1.5) with g = Const and arbi¬ 
trary uj (see also Hersch (1958) for such a construction principle). This property is 
readily verified with the variation-of-constants formula 

_ ( cos iw w -1 sinfuA /A 0 \ ^ 8) 

\q(t) J y—ujsmtuj cos tuo J \qo J 

This formula also shows that the following scheme for a velocity approximation 
becomes exact for g = Const : 

<7n+i - Qn—i = 2/isinc(/icj) q n . (1.9) 

Starting values q\ and qi are also obtained from (1.8) with g(qo) in place of g(q(s)). 

Deuflhard (1979) considered ti 2 -extrapolation based on the explicit symmetric 
method that is obtained by replacing the integral term in (1.8) by its trapezoidal rule 
approximation: 

q n+ 1 \ _ / cos huo sinc/ia;\ / q n \ h 2 f sine (huj) g n 

hq n +1 J y —huj sin hco cos huj ) \ hq n J 2 ( g n +i + cos {hw) g n 

( 1 . 10 ) 

Eliminating the velocities yields the two-step formulation 

q n +1 -2cos(hoj)q n +q n -i = h 2 sinc(ft.w) g n . (1.11) 

The velocity approximation is obtained back from 

2hsmc(hid) q n = q n+1 - q n - 1 (1.12) 


or alternatively from 

q n + 1 - 2 cos(huj)q n + g n -i = h -—-• 

Both Gautschi’s and Deuflhard’s method reduce to the Stormer-Verlet scheme for 
uj = 0. Both methods extend in a straightforward way to systems 

q = -Aq + g(q) (1.13) 

with a symmetric positive semi-definite matrix A, by formally replacing to by 
17 = A 1 / 2 in the above formulas. The methods then require the computation of 
products of entire functions of the matrix h 2 A with vectors. This can be done by 
diagonalizing A, which is efficient for problems of small dimension or in spec¬ 
tral methods for nonlinear wave equations. In high-dimensional problems where 
a diagonalization is not feasible, these matrix function times vector products can 
be efficiently computed by superlinearly convergent Krylov subspace methods, see 
Druskin & Knizhnerman (1995) and Hochbruck & Lubich (1997). 

The above methods permit extensions to more general problems (1.1) with (1.4), 
but this requires a reinterpretation to which we turn next. 
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XIII. 1.3 The Impulse Method 


Integrators based on r-RESPA [...] have led to considerable speed-up in 
the CPU time for large scale simulations of biomacromolecular solutions. 
Since r-RESPA is symplectic such integrators are very stable. 

(B.J. Berne 1999) 

The Stormer-Verlet method (1.3) can be interpreted as approximating the flow Vh 
of the system with Hamiltonian H(p , q) = T(p) + V ( q ) with T(p) = \p T p by the 
symmetric splitting 

v t v 
<Ph /2 °<Ph° <Ph /2 > 

which involves only the flows of the systems with Hamiltonians T(p) and V(q), 
which are trivial to compute; see Sect. II.5. 

In the situation (1.4) of a potential V = W + U, we may instead use a different 
splitting of H = (T + W) + U and approximate the flow cpjf of the system by 


u t+w u 
( Ph/2° ( Ph OC Ph/2 


This gives a method that was proposed in the context of molecular dynamics by 
Grubmuller, Heller, Windemuth & Schulten (1991) (their Verlet-I scheme) and by 
Tuckerman, Berne & Martyna (1992) (their r-RESPA scheme). Following the termi¬ 
nology of Garcia-Archilla, Sanz-Sema & Skeel (1999) we here refer to this method 
as the impulse method'. 


1. kick: set p+ = p n - \hWU(q n ) 

2. oscillate: solve q = —VW{q) with initial values (q n ,Pn ) 

over a time step h to obtain (g n+ i,_p“ +1 ) 

3. kick: set p n+1 = p~ +1 - \hS7U(q n+1 ) 


Step 2 must in general be computed approximately by a numerical integrator with 
a smaller time step, which results in the multiple time stepping method that we 
encountered in Sect. VIII.4. If the inner integrator is symplectic and symmetric, as it 
would be for the natural choice of the Stormer-Verlet method, then also the overall 
method is symplectic - as a composition of symplectic transformations, and it is 
symmetric - as a symmetric composition of symmetric steps. 

It is interesting to note that the impulse method (with exact solution of step 2) 
reduces to Deuflhard’s method in the case of a quadratic potential W(q) = \q r Aq 
(Exercise 1). 

Though the method does allow larger step sizes than the Stormer-Verlet method 
in molecular dynamics simulations, it is not free from numerical difficulties. Biesa- 
decki & Skeel (1993) and Garcia-Archilla et al. (1999) report and in linear model 
problems analyze instabilities and numerical resonance phenomena when the prod¬ 
uct of the step size h with an eigenfrequency u of V 2 VU is near an integral multiple 
of 7r. 
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XIII. 1.4 The Mollified Impulse Method 

We also propose a nontrivial improvement of the impulse method that 
we call the mollified impulse method , for which superior stability and 
accuracy is demonstrated. 

(B.Garcia-Archilla, J.M. Sanz-Serna & R.D. Skeel 1999) 

Difficulties with the impulse method can be intuitively seen to come from two 
sources: the slow force —VU{q) has an effect only at the ends of a time step, but it 
does not enter into the oscillations in between; the slow force is evaluated, somewhat 
arbitrarily, at isolated points of the oscillatory solution. 

Garcia-Archilla et al. (1999) propose to evaluate the slow force at an averaged 
value q n = a(q n ). They replace the potential U ( q ) by U(q) = U(a(q)) and hence 
the slow force —VU(q) in the impulse method by the mollified force 

-VU(q) = -a\q) T VU(a(q)) . (1.15) 

Since this mollified impulse method is the impulse method for a modified potential, 
it is again symplectic and symmetric. 

There are numerous possibilities to choose the average a(q n ), but care should be 
taken that it is only a function of the position q n and thus independent of p n , in order 
to obtain a symplectic and symmetric method. This precludes taking averages of the 
solution of the problem in the oscillation step (Step 2) of the algorithm. Instead, one 
solves the auxiliary initial value problem 

x = —\7W(x) with x(0) = q, x(0) = 0 (1.16) 

together with the variational equation (using the same method and the same step 
size) 

X = -V 2 W(x(t))X with X(0) = I, X(0) = 0 (1.17) 

and computes the time average over an interval of length ch for some c > 0: 

^ rch ^ r c h 

a(q) = — / x(t) dt , a f (q) = — / X(t)dt. (1.18) 

ch J o ch J o 

Garcia-Archilla et al. (1999) found that the choice c = 1 gives the best results. 
Weighted averages instead of the simple average used above give no improvement. 

Izaguirre, Reich & Skeel (1999) propose to take a(q) as a projection of q to the 
manifold VW(q) = 0 of rest positions of the fast forces, for situations where all 
non-zero eigenfrequencies of V 2 W(q) are much larger than those of V 2 f7(g). This 
choice is motivated by the fact that solutions oscillate about this manifold. 

We now turn to the interesting special case of a quadratic W(q) = \q r Aq with 
a symmetric positive semi-definite matrix A. In this case, the above average can be 
computed analytically. It becomes 


a(q) = (j){hfi)q 
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with Q = A 1 / 2 and the function 0(£) = sine(c£). For a(q) defined by the orthogo¬ 
nal projection to Aq = 0 we have 0(0) = 1 and 0(£) = 0 for £ away from 0. With 
g n = ^cj)(hfi) VC/(0(/ii2)g n ), the mollified impulse method reduces to 

Pn = Pn + 


f A = f 

\Pn+lJ { 

Pn+l ~ P n +1 "k 2^ n +i ' 


cos hfi 
-Q sin hQ 


h sine hP2 
cos hP2 


Qn 

n+ 


(1.19) 


This can equivalently be written as (1.10) with the same g n (and Q in place of u), 
or in the two-step form (1.11) with (1.12). 


XIII. 1.5 Gautschi’s Method Revisited 

We recall that Gautschi’s method (1.7) (with Q = A 1 / 2 in place of u) integrates 
equations q = — Aq + g(q) exactly in the case of a constant inhomogeneity g(q) = 
Const. This property is obviously kept if the argument of g in the algorithm is 
modified to 

9n = 5 , (0(^‘l^)^n) 

similar to the previous subsection. Such Gautschi-type methods were analyzed by 
Hochbruck & Lubich (1999a). Functions 0 with 0(0) = 1 that vanish at integral 
multiples of 7r give a substantial improvement over the original Gautschi method. 
The choice 

HO = sinc£(l + | sin 2 §£) (1.20) 

was found to give particularly good accuracy. The methods are symmetric but not 
symplectic. 

The following symmetric method for general problems (1.1) with (1.4) was pro¬ 
posed by Hochbruck & Lubich (1999a). The method reduces to Gautschi-type meth¬ 
ods for quadratic W(q) = \q r Aq. Given q n and q n , one computes an averaged 


value q n = a(q n ) and the solution of 

u = -VW{u) - VU(q n ) with u(0) = q n , tx(0) = q n 


( 1 . 21 ) 


backwards and forwards on the intervals from 0 to —h and 0 to h. Note that this 
requires only one evaluation of the slow force —VU. Then, q n+ 1 and q n +1 are 
computed from 


q n + 1 - 2 q n + Qn — l = u(h) - 2w(0) + u(-h ) 
q n +1 - in- 1 = u(h) - u(~ h ) ■ 


( 1 . 22 ) 


When the differential equation for u is solved approximately by a symmetric nu¬ 
merical method with smaller time steps, then this becomes a symmetric multiple 
time-stepping method. For the interpretation as an averaged-force method and for 
the corresponding one-step version, where the initial value for the velocity in (1.21) 
is replaced by ti(0) = 0, we refer back to Sect. VIII.4 (where q n instead of the 
average q n = a(q n ) was taken as the argument of the slow force — VC/). 
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XIII. 1.6 Two-Force Methods 

Hairer & Lubich (2000a) compare the analytical solution and the numerical solu¬ 
tions given by the above methods in the Fermi-Pasta-Ulam model of Sect. 1.5.1, 
using the tool of modulated Fourier expansions (see Sections XIII.3 and XIII.5 be¬ 
low). Their analysis of the slow energy exchange between stiff springs leads them 
to propose the following method for equations q = — Aq + g(q), which requires two 
evaluations of the slow force per time step: with Q — A 1 ' 2 , set 

q n +1 - 2 cos (h(2) q n + q n -% = h 2 sine (hQ) g(q n ) + h 2 d n (1.23) 

with 

d n = sinc 2 (/ii?) g(q n ) — sinc(/ii?) g(smc(hf2)q n ) . (1-24) 

This method gives the correct slow energy exchange between stiff components in 
the model problem and has better energy conservation than the Deuflhard/impulse 
method. With the velocity approximation (1.12) the method can equivalently be 
written in the one-step forms (1.19) or (1.10). The method extends again to a sym¬ 
metric method for general problems (1.1) with (1.4), giving a correction to the im¬ 
pulse method: let g(q) = — VC/( q ) and let a(q) be defined by (1.18) with c = 1. Set 
Qn = a i ( in) and 

a(Qn) = ^(a(q n + \h 2 g{q n )) - a(q n )) . 

The method then consists of taking 

g n = g{q n ) + g(q n ) - g{q n ) 

instead of g(q n ) = — VU(q n ) in the impulse method (1.14). 

A two-force method with interesting properties, for situations where all non-zero 
eigenfrequencies of A are much larger than those of V 2 (7(g), is given by (1.23) with 

d n = sine 2 (\hQ) g(x{hQ)q n ) - sine (hf2) g(x{hf2)q n ) , (1.25) 

where x(0) = 1 and %(£) = 0 for £ away from 0. 


XIII.2 A Nonlinear Model Problem and Numerical 
Phenomena 

To gain insight into the properties of the various numerical methods described in 
the previous section, it is helpful to study the methods when they are applied to 
suitably chosen, rather simple model problems which show characteristic features 
but are still accessible to an analysis. Such an approach has traditionally been very 
successful for stiff differential equations (see, e.g., Hairer & Wanner 1996). For the 
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present stiff-oscillatory case we investigate the behaviour of the numerical methods 
on nonlinear systems 

x + fi 2 x = g(x) (2.1) 

with a smooth gradient nonlinearity g(x) = —\7U(x) and with the square matrix 

fl =(o */) ' "»*■ (2 - 2) 

with blocks of arbitrary dimension. We consider only solutions whose energy is 
bounded independently of u, so that in particular the initial values satisfy 

i||a ; (0)|| 2 + i||^(0)|| 2 <i; (2.3) 


with E independent of u. 

The Fermi-Pasta-Ulam (FPU) problem of Sect. 1.5.1 belongs precisely to this 
class, and we will present numerical experiments with this example. In the model 
problem (2.1) with (2.2) we clearly impose strong restrictions in that the high fre¬ 
quencies are confined to the linear part and that there is a single, constant high fre¬ 
quency. The extension to several high frequencies will be given in Sect. XIII.9, and 
constant-frequency systems with a position-dependent kinetic energy term are con¬ 
sidered in Sect. XIII. 10. Oscillatory systems with time- or solution-dependent high 
frequencies will be studied, with different techniques and for different numerical 
methods, in Chap. XIV. 

In any case, satisfactory behaviour of a method on the model problem (2.1) can 
be anticipated to be necessary for a successful treatment of more general situations. 

XIII.2.1 Time Scales in the Fermi-Pasta-Ulam Problem 

The FPU model shows different behaviour on different time scales: almost-harmonic 
motion of the stiff springs on the time scale cj _ 1 , motion of the soft springs on the 
scale cj°, energy exchange between stiff springs on the time scale a;, and almost- 
preservation of the oscillatory energy over intervals that are exponentially long in 
u. This is illustrated in the following. 

We consider the FPU problem with three stiff springs with the data of Sect. 1.5.1. 
The four pictures of Fig. 2.1 show the evolution of the following quantities: the total 
energy 

H(x,x) = \x T x + \x T f2 2 x + [/(#), (2.4) 

(or rather H — 0.8 for graphical reasons), which is a conserved quantity; the oscil¬ 
latory energy 


/ = J|*F h + h with Ij = \ x\ j + \ J 2 x\^ , (2.5) 

where xij is the j th component of the lower half Xi e R 3 of x = (a; 0 ,a;i) T e K 6 , 
decomposed according to the blocks of 17 in (2.2). We recall that xij represents the 
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Fig. 2.1. Different time scales in the Fermi-Pasta-Ulam model (u; — 50) 


elongation of the jth stiff spring. Further quantities shown are the kinetic energy of 
the mass centre motion and of the relative motion of masses joined by a stiff spring, 

T 0 = i||±o|| 2 , Ti = ||lx a f . 

Time Scale cu -1 . The vibration of the stiff linear springs is nearly harmonic with 
almost-period tt/uj. This is illustrated by the plot of T\ in the first picture. 

Time Scale a; 0 . This is the time scale of the motion of the soft nonlinear springs, as 
is exemplified by the plot of To in the second picture of Fig. 2.1. 

Time Scale u. A slow energy exchange among the stiff springs takes place on the 
scale u. In the third picture, the initially excited first stiff spring passes energy to 
the second one, and then also the third stiff spring begins to vibrate. The picture 
also illustrates that the problem is very sensitive to perturbations of the initial data: 
the grey curves of each of Ji, / 2 , 13 correspond to initial data where 10“ 5 has been 
added to # 0,1 (0), £ 0,1 (0) and i?i,i(0). The displayed solutions of the first three 
pictures have been computed very accurately by an adaptive integrator. 

Time Scale uj n , N > 2. The oscillatory energy / has only O(o; _1 ) deviations from 
the initial value over very long time intervals. The fourth picture of Fig. 2.1 shows 
the total energy H and the oscillatory energy I as computed by method (1.10)-(1.11) 
of Sect. XIII. 1.2 with the step size h = 2/cj, which is nearly as large as the length 
of the time interval of the first picture. No drift is seen for H or I. 
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XIII.2.2 Numerical Methods 

The methods described in Sect. XIII. 1 all have in common that they reduce to the 
Stormer-Verlet method when they are applied to (2.1) with Q = 0, and they become 
exact solvers for the linear homogeneous problem with g(x) = 0. They can be 
formulated as one-step or two-step schemes. 

Two-Step Formulation. All the methods of Sections XIII. 1.2-XIII. 1.5, when ap¬ 
plied to the system (2.1), can be written in the two-step form 

x n+ i - 2 cos (hQ)x n + x n -i = h 2 #g($x n ) . (2.6) 


Here = ip(hfi) and 'P = <j>{hfi), where the filter functions ip and 0 are even, 
real-valued functions with 0(0) = 0(0) = 1. In our numerical experiments we will 
consider the following choices of 0 and 0 , where again sinc(£) = sin£/£: 


{A) 

0(0 

§sinc?(!0 

0(0 = 1 

Gautschi (1961) 

(B) 

0(0 

= sinc(£) 

0(0 = 1 

Deuflhard (1979) 

(C) 

0(0 

= sinc(0 0(0 

0(0 = sinc(0 

Garcia-Archilla & al. (1999) 

(D) 

0(0 

= sinc 2 )^) 

0(0 of (1.20) 

Hochbruck & Lubich (1999a) 

(E) 

0(0 

= sinc 2 (0 

0(0 = i 

Hairer & Lubich (2000a) 


One-Step Formulation. The method (2.6) can be written as a symmetric one-step 
method of a form that is motivated by the variation-of-constants formula (1.8). This 
now also includes a velocity approximation x n : 

x n +i = cos hf2 x n + i? -1 sin hf2 x n + \h 2 \Pg n (2.7) 

x n +i = —fismhfixn + cos hfix n + \h(&y o 9n + $i g n + 1 ) (2.8) 

where g n = g(<Px n ) and iZ'b = 0o(/^G), $i = 0i (hfi) with even functions 0o, 
satisfying 0o(O) = 1, 0i(O) = 1. Exchanging n n + 1 and h —h in the 


method, it is seen that the method is symmetric if and only if 

0(0 = sinc(£) V’l (0 , 0o(O = cos(O0i(O . (2.9) 

The method is then symplectic if and only if (Exercise 2) 

0(0 = sinc (0 0(0 • ( 2 - 10 ) 

Two-Step Velocity Schemes. For a symmetric method (2.7)-(2.8) the velocity ap¬ 
proximation can be equivalently obtained from 

2h sinc(/io;) x n = x n+ i — x n -i (2.11) 

(for sin (hu) ^ 0) or from 

2 COS(/lf2 ) x n -\- X n —\ = 2 ^ ^1 (,9n-\-1 9n— l) • (2.12) 


The latter formula gives a symmetric two-step method for arbitrary even functions 
0i with 0i (0) = 1, which do not necessarily satisfy (2.9). 
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Multi-Force Methods. The methods of Sect. XIII. 1.6 belong to the class of multi¬ 
force methods, which generalize the right-hand side of (2.6) to a linear combination 
of such terms: 


k 

x n+ \ — 2 cos{hft) x n + x n -\ = h 2 ^ g($jX n ) (2.13) 

i=i 

with Wj — ipj(hfi), <Pj = cj)j(hQ), where 'i/jj, <p 3 are even functions with 

k 

£^(0) = 1, = 1 for j = 1,... ,k. 

j= i 

In our numerical experiments we include the method 

(F ) two-force method (1.23) with (1.24). 

XIII.2.3 Accuracy Comparisons 

The accuracy of the methods (A)-(E) and the Stormer-Verlet method on a short 
time interval is shown in Fig. 2.2, where the errors at t = 1 of the different solution 
components in the FPU problem (with u = 50) are plotted as a function of the 
step size h. Here and in all the following numerical experiments, the methods were 



(A) - (B)- (C) - 

(D) - (E) . Verlet - 

Fig. 2.2. Global error at t = 1 for the different components and for the five methods (A) - (E) 
and the Stormer-Verlet method as a function of the step size h 
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Fig. 2.3. Global error at the first grid point after t — 1 for the different components as a 
function of the step size h. The error for method (A) is drawn in black, for method (B) in 
dark grey, and for method (C) in light grey. The vertical lines indicate step sizes for which 
huj equals 7r, 2tv, or 


implemented in the one-step formulation (2.7)-(2.8) with (2.9). The errors in the 
xo -components are nearly identical for all the methods in the stability range of the 
Stormer-Verlet method (huj < 2). Differences between the methods are however 
visible for larger step sizes. For the other solution components x\, xo , x\ there 
are pronounced differences in the error behaviour of the methods. All five methods 
(A)-(E) are considerably more accurate than the Stormer-Verlet method. Figure 2.3 
shows the errors of methods (A)-(C) for step sizes beyond the stability range of 
the Stormer-Verlet method. Methods (A) and (B) lose accuracy when hu; is near 
integral multiples of 7r, a phenomenon that does not occur with method (C). 


XIII.2.4 Energy Exchange between Stiff Components 

Figure 2.4 shows the energy exchange of the six methods (A)-(F) applied to the 
Fermi-Pasta-Ulam problem with the same data as in Fig. 2.1. The figures show 
again the oscillatory energies h, h of the stiff springs, their sum I = I\ +I 2 -\-Is 
and the total energy H — 0.8 as functions of time on the interval 0 < t < 200. Only 
the methods (B), (D) and (F) give a good approximation of the energy exchange 
between the stiff springs. It will turn out in Sect. XIII.4.2 that a necessary condition 
for a correct approximation of the energy exchange is ^(huj)(j)(huL)) = sine (hej), 
which is satisfied for method (B). The two-force method (F) satisfies an analogous 
condition for multi-force methods. The good behaviour of method (D) comes from 
the fact that here ^(huj)cj)(huj) « 0.95 sine (hu) for hu; = 1.5. 
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Fig. 2.4. Energy exchange between stiff springs for methods (A)-(F) (h — 0.03, uj — 50) 



7T 2tV 37T 47T 7T 2tV 37T 4tV tv 2tv 3tv 4tv 


Fig. 2.5. Maximum error of the total energy on the interval [0,1000] for methods (A) - (F) as 
a function of huj (step size h — 0.02) 


XIII.2.5 Near-Conservation of Total and Oscillatory Energy 

Figure 2.5 shows the maximum error of the total energy H as a function of the scaled 
frequency huj (step size h = 0.02). We consider the long time interval [0,1000]. The 
pictures for the different methods show that in general the total energy is well con¬ 
served. Exceptions are near integral multiples of tv. Certain methods show a bad 
energy conservation close to odd multiples of tv, other methods close to even multi¬ 
ples of tv. Only method (E) shows a uniformly good behaviour for all frequencies. In 
Fig. 2.6 we show in more detail what happens close to such integral multiples of tv. 
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Fig. 2.6. Zoom (close to 7 r or 27r) of the maximum error of the total energy on the interval 
[0,1000] for three methods as a function of huo (step size h — 0.02) 








Fig. 2.7. Maximum deviation of the oscillatory energy on the interval [0,1000] for methods 
(A) - (F) as a function of huo (step size h = 0.02) 


If there is a difficulty close to 7 r, it is typically in an entire neighbourhood. Close to 
27 t, the picture is different. Method (C) has good energy conservation for values of 
huo that are very close to 27r, but there are small intervals to the left and to the right, 
where the error in the total energy is large. Unlike the other methods shown, method 
(B) has poor energy conservation in rather large intervals around even multiples of 
7 r. Methods (A) and (D) conserve the total energy particularly well, for huo away 
from integral multiples of n. 

Figure 2.7 shows similar pictures where the total energy H is replaced by the 
oscillatory energy I (cf. Sect. XIII.2.1). For the exact solution we have I(t ) = 
Const + O(o; _1 ). It is therefore not surprising that this quantity is not well con¬ 
served for small values of uo. For larger values of uo, we observe that the methods 
have difficulties in conserving the oscillatory energy when huo is near integral mul¬ 
tiples of 7 r. None of the considered methods conserves both quantities H and I 
uniformly for all values of huo. 
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XIII.3 Principal Terms of the Modulated Fourier 
Expansion 

The analytical tool for understanding the above numerical phenomena is provided 
by modulated Fourier expansions , which decompose both the exact and the numer¬ 
ical solution into a slowly varying part and into oscillatory components built up 
of trigonometric functions multiplied with slowly varying coefficient functions. A 
comparison of these expansions will serve as a partial substitute for the backward 
error analysis of Chap. IX, which yields results only for huo — » 0 and is not applica¬ 
ble to the situation of huj > c > 0 that is of interest here. In this section we derive 
the first terms of the modulated Fourier expansion. 

XIII.3.1 Decomposition of the Exact Solution 

Every solution of the linear equation x + Q 2 x = g(t) with Q of (2.2) can be written 
as y(t) + cos(c at) u(t ) + sin (uot) v(t) + 0(uj~ n ) (for u —» oo), where y(t), u(t ), 
v(t) are truncated asymptotic expansions in powers of a; -1 (see Exercise 4). These 
functions have the property that all their derivatives are bounded independently of 
the parameter u ^ 1. Here and in the following, a smooth function is understood 
to be a function with this property. We may hope to find a similar decomposition 
for solutions of the nonlinear problem (2.1). So we look for a smooth real-valued 
function y(t) and a smooth complex-valued function z(t) = u(t) + iv(t) such that 
the function 

x*(t) = y(t) + e lujt z(t ) + e~ luJt z(t) (3.1) 

gives a small defect when it is inserted into the differential equation (2.1) and has 
the given initial values 


£*(0) = x(0) , x*(0) = £(0) . 


(3.2) 


Under the condition (2.3) the exact solution x(t) has bounded energy, and we 
may expect the same of the approximation x*(t), which would then imply z(t) = 
(9(cj - 1 ). We therefore insert the ansatz (3.1) into the differential equation (2.1) 
and expand the nonlinearity around the smooth part y(t). With the variables y = 
(yoiVi), z = (^ 0 ? %i) partitioned according to the blocks of !?, this gives the ex¬ 
pressions 


= 


V i 


V o 

+ u 2 yx 


+ e 


— icot 


-uj 2 zq + 2iujzo + zo 
2iujz\ + z\ 

—ce 2 zo — 2icezo + zq 
— 2iujzi + z i 


and, as long as z(t) = 0(uj 1 ), 

g(x*) = g{y) + g"{y){z,z) + e lut g\y)z + e~ lut g\y)z 


+ e^\g"{y){z,z) + e 


-2 iujt 1 J! 


g"(y)(z,z) + 0(u ). 
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Equations for the Coefficient Functions. We now compare the coefficients of 
1, e lujt , e~ lujt and require that the dominant terms in these expressions be equal: 

2/o = go(y) + 9 o(y)(z,z) 

u 2 yi = gi(y) 

-^o = g' 0 (y)z 

2 iujzx = g[(y)z. 

This gives a system of differential equations for y Q , z\ and expresses y \, zq as 

functions of yo,z\. We note that yo evolves on the time scale 1, whereas z\ 
changes on the slow time scale uj. As long as yo(t) stays in a bounded domain 
and zi(t) = 0(cj _ 1 ), (3.3) implies the bounds 

yi{t) = 0 (lo~ 2 ) , zo(t) = 0(cu~ 3 ) , z-i{t) = 0 {uj~ 2 ) . (3.4) 

Initial Values. The initial values yo (0), yo (0) and z± (0) are obtained from condition 
(3.2), which gives a system that can be solved by fixed point iteration to yield 

2/o(0) = ^o,o + 0(u;~ 3 ) , 2/o(0) = ^o,o + 0(cj~ 2 ) 

2Re^i(0) = xo,i + 0 (uj~ 2 ) , — u 2Imzi(0) = xo,i + 0 (cu~ 2 ) . 

Defect. As long as z\{t) = O(o; _1 ), the above equations show that the defect 
d(t) = x*(t) + f2 2 x*(t) - g(x*(t)) 

is of the form 

u- 2 e iut a{t) + c j- 2 e 2iujt b(t) + 0 (cj~ 3 ) 

0(c- 2 ) 

with smooth functions a, b. Together with (3.3) this also shows that the smooth 
O(o; _2 )-term g"(y)(z,z) is the principal term describing the influence of oscilla¬ 
tory solution components on the evolution of smooth components. 

Example. To illustrate the approximation of the solution x(t) by x*(t) of (3.1), we 
have solved numerically, with high accuracy, the system (3.3) for the FPU problem 
with the data of Sect. 1.5.1. In Figure 3.1 we plot the oscillatory energy I = I\ + 
I 2 + Is with x replaced by the approximation x* in the definition (2.5) of these 
quantities. The figure agrees rather well with Figure 1.5.2. 


d(t) = Re 




Fig. 3.1. Same experiment as in Fig. 1.5.2 for the solution (3.1) of (3.3) 
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XIII.3.2 Decomposition of the Numerical Solution 

For the numerical method (2.6), which solves linear equations x = —fl 2 x exactly, 
we look similarly to the above for a function of the form 

Xh(t) = y h (t) + e lut z h {t) + e~ wt z h (t) (3.7) 

with coefficient functions yh(t) 9 Zh(t) which are smooth in the sense that all their 
derivatives are bounded independently of h and u, such that Xh(t ) gives a small 
defect when inserted into the difference scheme (2.6) and has the correct starting 
values: 

x h (0) = x 0 , x h (ft) = xi . (3.8) 

Taylor expansion of Zh(t ± ft) at the point t shows, after some calculation, 

±(x h {t + h)-2coa{hn)x h {t)+x h {t-h)) = (^2 yh %fs2 yhAt) ) 

. Jut ( lu 2 z h fl{t) + <j 2 2iuz hfi (t) + cos(hco)z h fi(t) + ... \ 

\ cr 2 2iuzh,i(t) + cos(hu)zh,i(t) + ... J l '' ' 


+ the complex conjugate of the expression in the previous line, 

where y h (t ) = (yh,o(t), 2/M W) and z h(t ) = (z hfi (t),z hA (t)) according to the 
partitioning in (2.2), 

5 hVh(t) = 2 (y h (t + ft) - 2y h (t) + y h (t- ft)) 

is the symmetric second-order difference quotient, dj- = sine (^khw), and the dots 
stand for higher powers of h multiplied by derivatives of Zh,o or Zh;%*. Taylor expan¬ 
sion of the nonlinearity now gives 

&g($x h ) = <Pg($y h )+ &g"($y h )(<Pz h ,<Pz h ) 

+ e lu)t ^g'(<Py h )<Pz h + e~ tu)t ^g'(<Py h )<Pz h + ... 

Modified Equations for the Numerical Coefficient Functions. For the moment 
we consider the case where the absolute values of o\ and are bounded from 
below by a positive constant, so that huj is assumed bounded and bounded away 
from a non-zero integral multiple of i r. We also assume hue to be bounded away 
from zero, which is the computational situation of interest. In this case, the first 
term in each line of each bracket in (3.9) can be considered as the dominant one. We 
therefore require that the functions yh , Zh satisfy 

Slvhfi = go(&yh) + g'o(&yh)(&z h ,<Pz h ) 


sinc 2 (^hcj) ui 2 yh,i = ip{huj) gi($yh) (3.11) 

—sine 2 (iftw)w 2 2:^0 = ^($y h )z hfi + ^-($y h )(f>(hio)z hA 


sine (huf) 2iuj 


fp(huj)^-(^y h )z hfi 


V’(ftw)|^-(^2//i)</ > (ftw)%, 1 
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The first equation should be stated more precisely as o being a solution of a 
modified equation for the Stormer-Verlet method (see Exercise IX.3) applied to the 
corresponding differential equation: 

Vh,o ~ (l- ^^2) (soO%») + g'o($yh)(&z h ,&Zh)) , 

where the time derivatives of yh,i,Zh that result from applying the chain rule are 
replaced by using the expressions in (3.11). As long as yh,o(t) remains in a bounded 
domain and Zh,i(t) = O(o; _1 ), we have again bounds of the same type as for the 
coefficients of the exact solution: 


Uh,i(t) = 0 (lo 2 ) , z hfi (t) = 0(u 3 ) , z h ,i(t) = 0 (uj 2 ) . (3.12) 

Initial Values. We next determine the initial values yn, o(0), yh, o(0) and Zh, i(0) 
such that Xh( 0) and Xh(h) coincide with the starting values xo = x(0) and x\ of 
the numerical method. We let x\ be computed from and x$ via the formula (2.7) 
with n = 0, and we still assume that o\ and are bounded away from zero. Using 

(3.11) , the condition Xh( 0) = xo = (xo,o, #o,i) then becomes 

xq,o = Vh,0 (o) + O(uj~ 2 z hil (0)) 

(3.13) 

x 0 ,i — Zh t i(0) + Zh,i(0) + 0(u 2 ) . 

The formula for the first component of (2.7), x\$ — 2 : 0,0 = hi o,o + \h 2 ga{<I>xo), to¬ 
gether with x h ^(h)-x h ^(0) = hy h fi(0) + lh 2 go{4 > xo) + 0(h 3 ) + 0((jj- 2 z htl (0)) 

implies that 

io,o = y/,,o(0) + ^( ft2 ) + OK\i(0))- (3-14) 

For the second component we have from (2.7) 

^ 1,1 ~ cos(/icj)xo,i = /isinc(/ic<;)xo,i + \h 2 ijj(hoj) ^i(^xq), 

and by Taylor expansion and (3.11), 

Xh,i(h) - cos(huz)x h ,i(0) = (l-cos(hu>))y htl (0) + O(hu>~ 2 ) 

+ zsin(hu>)(z ha (0) -z h , i(0)) +0(hu~ 1 z hA ( 0)), 

where we note the relation (1 — cos(/iw)) yh,i(0) = \h' 2 ip(luz) gii^y},(())) by 

(3.11) and a trigonometric identity. After division by h sine hcu = w -1 sinft.ee the 
above formulas yield 

*o,i = iu(z h , i(0) - z h , i(0)) + 0{lo~ 2 ) + O (uj - 1 Zh,i(0))■ (3.15) 

The four equations (3.13), (3.14), (3.15) constitute a nonlinear system for the four 
quantities yo(0) 9 yo(0) 9 u(zh,i(0) +^m( 0)), andcj(z M (0) -z hi i(0)). By fixed- 
point iteration and using the bounded-energy assumption (2.3), we get a locally 
unique solution for sufficiently small h, with z^ i(0) = O(o; _1 ) and hence 
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2//i,o(0) — x o,o + 0(u 3 ) , Vh,o{Q) — ^ 0,0 + 0(h 2 ) 

2R QZh i(0) = xo i + 0(cu~ 2 ) , —Cc; 2 Im^ i(0) = xo i + C(/icj _1 ) . 

(3.16) 

Defect. As long as Zh,i(t) = 0(cj _1 ), the defect 

dh(t) = -f[x h (t + h) -2cos(hfi)x h (t) + x h (t - hf - </';/(</>.(•/, (/)) (3.17) 

is of size 0(h 2 ) by (3.9)-(3.10) and the very construction (3.11) of the coeffi¬ 
cient functions. This estimate refers again to the non-resonant case where a 2 
are bounded away from zero and hence huj is bounded away from non-zero integral 
multiples of 7 r. The case of huo near a multiple of 7 r requires a special treatment and 
will be considered in the next subsection. 


XIII.4 Accuracy and Slow Exchange 

A comparison of the principal terms of the modulated Fourier expansions of the 
numerical and the exact solution gives much insight into the behaviour of the nu¬ 
merical method and the role of the filter functions ip and <f >. From this comparison 
we obtain error bounds over finite time intervals, and we discuss the slow energy 
exchange between oscillatory components and the slow energy transfer from oscil¬ 
latory to smooth components which take place on the time scale u. 


XIII.4.1 Convergence Properties on Bounded Time Intervals 

As a first application of the modulated Fourier expansion we consider error bounds 
on bounded time intervals. Second-order convergence estimates for more general 
equations x = —Ax + g(x) with symmetric positive semi-definite matrix A , uni¬ 
formly in the (arbitrarily large) eigenfrequencies of A , are given by Garcia-Archilla, 
Sanz-Serna & Skeel (1999) for the mollified impulse method, by Hochbruck & Lu- 
bich (1999a) for Gautschi-type methods, and by Grimm & Hochbruck (2005) for 
general methods of the class (2.7)-(2.8) with appropriate filter functions. Those 
results were proved with different techniques. The following bounds on the filter 
functions ip and <p> are needed for second-order error bounds of method (2.6): 

\ip(huo)\ < Ci sinc 2 (^cj) , 

\(p(hcj)\ < C 2 |sinc(|/io;)| , (4.1) 

\ip(hu)(p(huj)\ < C 3 |sine(ftu;)| . 

Theorem 4.1. Consider the numerical solution of the system (2.1) -(2.3) by method 
(2.6) with a step size h < ho (with a sufficiently small ho independent of cu) for 
which huj > Co > 0. Let the starting value x\ be given by (2.7) with n = 0. If the 
conditions (4.1) are satisfied, then the error is bounded by 
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\\x n — x(nh)\\ < C h 2 for nh <T . 

If only \f>(huj)\ < Co |sinc(|/ia;)| holds instead of (4.1), then the order of con¬ 
vergence reduces to one: \\x n — x(nh) [[ < C h for nh < T. In both cases, C is 
independent of u, h and n with nh < T and of bounds of solution derivatives, but 
depends on T, on E of (2.3), on bounds of derivatives of the nonlinearity g, and on 
Ci, C 2 , C 3 or Co- 

To obtain second-order error bounds uniformly in hue, condition (4.1) requires 
a double zero of 0 and a zero of 0 at even multiples of 7 r, and a zero of 0 
or 0 at odd multiples of it. This is satisfied for the mollified impulse method 
with 0(£) = sinc(£), for which 0(£) = sinc 2 (£). Gautschi-type methods have 
0(£) = sinc 2 (|£), so that the first condition on 0 in (4.1) is trivially satisfied. The 
conditions on 0 hold, for example, for 0 = sine or for 0 of (1.20). The original 
Gautschi method has 0 = 1, which does not satisfy the second condition of (4.1), 
and the Deuflhard/impulse method (0 = sine, 0=1) satisfies only the third condi¬ 
tion of (4.1). These latter methods are not of second order uniformly in hu. 

Proof of Theorem 4.1. (a) First we consider the case where hu is bounded away 
from integral multiples of 7 r, so that condition (4.1) is not needed. Comparing the 
equations (3.3) and (3.11), which determine the modulated Fourier expansion coef¬ 
ficients, shows 

Uh(t) - y(t ) = 0(h 2 ) , z h (t) - z(t) = 0(h 2 ) 

on bounded intervals, and hence 

x h (t) - x*(t) = 0(h 2 ) . (4.2) 

The variation-of-constants formula (1.8) and a Gronwall-type inequality show that, 
on bounded intervals, the error x * (t) — x(t) is of the same magnitude as the defect: 
by (3.6), 

x*(t) - x(t) = 0(u~ 2 ) . 

The errors e n = x n — Xh(t n ) satisfy 

e n+ i - 2 cos (hf2) e n + e n _i = b n (4.3) 

with b n = ti 2 (\Pg(<I>x n ) — Eg(T>Xh(t n )) —dh(t n )). This recurrence relation can be 
solved to yield (Exercise 5) 

n 

e n +1 = — fFn-i^o + W n ei + W n -jbj (4.4) 

3 =1 


W n = 


sin(n + 1 )hw 


with 


(n + 1 )I 

0 


sin hu; 
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A discrete Gronwall inequality now yields that on bounded intervals, e n is of the 
same magnitude as the defect dh(t) of (3.17), which is 0(h 2 ) by the construction 
of (3.11) and by Zh ,i = 0(o; _1 ). Hence, 

%n = 0(h ) . 

Combining these estimates yields the desired second-order error bound. 

(b) We now consider the case where a;|sine (\hui) \ > c with a sufficiently large 
constant c, which depends only on bounds of derivatives of g. This condition means 
that huj is outside of an 0(h ) neighbourhood of integral multiples of 2i r. Under 
conditions (4.1), the equations (3.11) still give 

Vh,i(t) = 0(uj~ 2 ) , z hfi (t) = 0 (uj~ 2 ) , 4,i(i) = 0(lo~ 2 ) (4.5) 

as long as Zh,i(t) = O(o; _1 ). Here the first condition of (4.1) gives the bound of 
yh, i, the second one the bound of Zh, o> and the third one the bound of Zh,i- As 
in Sect.XIII.3.2, we determine the initial values yh,o(0) 9 yn, o(0) and z^ i(0) such 
that x/i(0) and Xh(h) coincide with the starting values xq and x\ of the numerical 
method. Using once more (4.1), we obtain a system for the initial values similar to 
(3.13)—(3.15): 

Z 0,0 = 2/m(0) + O(u~ 1 z h , 1 (0)) 

x o,i = z h,i(0) + Zh t i(0) + 0(u> 2 ) (4 6 ) 

2^0,0 = Vh,o{0) + 0(h) + O(u~ 1 z ht i(0)) 

•'•(i.i = i^(z ht i(0):^z ht i(0))+0(u>~ 1 ) + 0(z htl (0)). 

With the weaker estimates for z^o(t) an d in (4.6) we still obtain estimates for the 
initial values of the type (3.16) with at most one factor uj ~ l or h less in the remainder 
terms. Condition (2.3) implies again zi(0) = O(o; _1 ), which ensures that (4.5) 
holds for 0 < t < T. The defect is then dh(t) = 0(h 2 ), and as in part (a) we get 
the second-order error bound. 

(c) Now let cj|sinc (\hu) \ < c, so that huj is 0(h) close to a multiple of 27r. In 
this case we replace the third equation in (3.11) simply by 

Zh, o = 0. 

Under condition (4.1) we still obtain the bounds (4.5). The initial values are now 
chosen to satisfy 

2^0,0 = Vh,o(0) 

X0,t = Zh, l(O) + Zh, i(0) + ^~ 2 . ^2 7T7 \ ffi(^o) 

sine raw) (4.7) 

io.o = Vh, o(0) 

•'’(I.I = /—(-/,.! (0) -t/,.i(())). 

They are then bounded as in (b) and, by the arguments used in the determination 
of the initial values of Sect. XIII.3.2, yield the estimates 2^(0) = xq + 0(h 3 ) 
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and Xh(h) = x\ + 0(h 3 ), and again Zh,i(t) = (9(o; _1 ). Since (4.1) implies 
<j>(hu)zh 1 i = 0 (uj~ 2 ) in the present situation of |sinc(|/iu;)| < cu; -1 , the de¬ 
fect is still dh(t) = 0(h 2 ). The bound (4.2) is also seen to hold. Therefore the 
second-order error bound remains valid in this case. 

(d) If only \f>(huj)\ < |sinc(|/ia;)| holds, then we replace the third equation in 
(3.11) by Zh ,o = 0. If < 1, we also set yh,i = 0. The defect is then 

only dh(t) = 0 (h), which yields the first-order error bound. □ 

For the velocity approximation, we obtain the following for the method (2.12) 
or its equivalent formulations. 

Theorem 4.2. Under the conditions of Theorem 4.1, consider the velocity approxi¬ 
mation scheme (2.12) with a function satisfying Vt( 0 ) = 1 and 

\f>i(huj)\ < C[ |sinc(|/io;)| . (4.8) 

Let the starting values satisfy = x(0) and x\ = x(h) + hsin(hf2)ai + 0(h 2 ) 
with cli = 0(1). Then , the error in the velocities is bounded by 

\\x n — x(nh)\\ < C h for nh <T , 

where C is independent ofuj, h and n with nh <T and of bounds of solution deriva¬ 
tives, but depends on T, on E of (2.3), on bounds of derivatives of the nonlinearity g, 
and on Ci, C 2 , C 3 and C[. 

Proof, (a) By the variation-of-constants formula (1.8), the exact solution satisfies 

x(t + h) — 2 cos (hf2)x(t) + x(t — h ) 

= J cos ((h — s)T2) (g(x(t + s)) — g(x(t — s))^ ds . 

With the modulated Fourier expansion, we write the exact solution as 

x(t) = y(t ) + e iujt z(t) + e~ iujt z(t) + O(uo~ 2 ) 


to obtain 


g(x(t + s)) - g(x(t-s )) 

= (/ ( y(t )) ^2s y(t) — 4 sin(ws) Im ( e lLJt z(t )) + 0(s 2 ) + 0{w~ 2 )^ . 

Using the bounds (3.4), abbreviating gij = dgi/dxj and omitting the arguments t 
and y(t) on the right-hand side, we therefore have 


x(t + h) — 2 cos (hQ)x(t) + x(t — h) 

h 2 9o,o Vo ~ 2 h 2 sinc 2 (^/icj) c 0 ^ 0,1 Im (e lujt zi) + G(h 3 ) 
h 2 sine 2 (\huj) gi,o?/o — 2 h 2 smc(huj) u g\^\ Im (e lujt zi) + 0(h 3 ) 
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We now use the discrete variation-of-constants formula (4.4) and partial summation. 
For example, the expression 

X sm ( n+1 \h 2 smc 2 {\hu)g lfi (y{jh)) y 0 (jh) 

— sin nuj 

3 =i 

is seen to be 0(h) uniformly in huo and for nh < T by partial summation, using 
that the function gi,o(y(t))yo(t) has a bounded derivative and that 


sin (\huj) 
sm(hcj) 


k 

X sin (jhw) = 0(k ) . 
i =i 


In this way we obtain 


x{nh) = - 11',,.. a ./• (()) + IT,, x{h) (4.9) 

+ ^EJ=i(«+ 1 ~ j)hgofi{y(jh))y 0 {jh)^ _ 

(b) For the numerical approximation we proceed similarly. Inserting the modulated 
Fourier expansion of the numerical solution, 

X n = Vh(t) +e wt z h (t) + e~ lult Zh{t) + 0(h 2 ) for t = nh<T, 


into the numerical scheme, we have with (3.12) or (4.5) 


x n +i - 2 cos (hu>) x n + x n -i 

-h 2 ( 9o,oVh,o ~ 2 (j>(huj) sinc(hu) u g 0 ,i lm (e lult z hA ) + 0(h) \ 

V i>i(hu) gi,oyh,o ~ 2 (ipi4>)(hu>) smc(hco) u>g ia Im {e iut z hA ) + 0(h) ) 

where the functions g t j are evaluated at $yh(t) and the argument t = nh is to be 
inserted in yh.o and z/, j ■ Under the condition ( 4 . 8 ) on ip i, we obtain as in ( 4 . 9 ) 

X n = -Wn-xXo + W n Xi ( 4 . 10 ) 

+ ( h YTj= i( n + 1 - j) h 9o,o($yh(jh))yh,o(jh)\ + _ 


Since we know from the estimates (3.12) and from the proof of Theorem 4.1 that 
$yh(t) = y(t) +G(h 2 ) and ijh(t) = y(t) + (D(h 2 ), a comparison of (4.9) and (4.10) 
gives the result. □ 


XIII.4.2 Intra-Oscillatory and Oscillatory-Smooth Exchanges 

In this subsection we turn to the approximation of slow effects that take place on 
the time scale u. Since solutions may depart from each other exponentially, we 
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cannot expect to obtain small point-wise error bounds on such a time scale. Instead, 
we take recourse to a kind of formal backward error analysis where we require 
that the equations determining the modulated Fourier expansion coefficients for the 
numerical method be small perturbations of those for the exact solution. It may be 
expected that methods with this property - ceteribus paris - show a better long-time 
behaviour, and this is indeed confirmed by the numerical experiments. 

In the Fermi-Pasta-Ulam model, the oscillatory energy of the jth stiff spring is 

T 1*2 ,122 

7 i = 2*1,5 + 2 W *l,j > 

where x\ j is the jth component of the lower block x\ of x. In terms of the modu¬ 
lated Fourier expansion, this is approximately, up to O(a; -1 ), 

T rv 1 L', | 1, ,2 1., 'icut , — _ o, ,21 _ |2 

lj ~ 2 — IUJZijC | + 2 ^ \ z l,j e + z l,j e | — pi,j| 


The energy exchange between stiff springs as shown in Fig. 2.1 is thus caused by 
the slow evolution of z\ determined by (3.3). This should be modeled correctly by 
the numerical method. 

The term g'o ( y)(z , z) in the differential equation for yo in (3.3) is the dominant 
term by which the oscillations of the stiff springs exert an influence on the smooth 
motion. A correct incorporation of this term in the numerical method is desirable. 

Upon eliminating y\ and zo in (3.3), the differential equations for yo and z\ 
become, up to 0 (cj - 3 ) perturbations on the right-hand sides, 


Vo 

2iu; zi 


go{yo,u 2 gi (yo, o)) 


cPgo 

dx^ 


(yo,0)(z 1 ,z l ) 


(4.11) 


This is to be compared with the analogous equations for the modulated Fourier 
expansion of the numerical method, which follow from (3.11): 


d 2 g 0 , 


ShVhfl = go{yh,o,yu 2 5i(y?i,o,0)) + p-Q^-(yh,o,0)(z h)1 , z htl ) 


2iuz hA = a £-(y hfi ,0)z htt 


with 


a 


dgi 
dx\ 

(ip4>){Juo) 


(3 = (p(hu) 2 , 7 


(ip4>)(hui) 

„2/l , 


sine (huj) ’ ' 1 1 sine 2 (|/ioi) 


(4.12) 


(4.13) 


The differential equation for Zh ,i is consistent with that for z\ only if a = 1, i.e., 


'ip(huj ) ) = sine (hw) . 


(4.14) 


Among all the methods (2.6) considered, only the Deuflhard/impulse method (^ = 
sine, (j) = 1) satisfies this condition. For this method we indeed observe a qual¬ 
itatively correct approximation of the energy exchange between stiff springs in 
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Fig. 2.4, but we have also seen that the energy conservation of this method is very 
sensitive to near-resonances. 

A correct modeling of the slow oscillatory-smooth transfer would in addition 
require (3 = 1 and possibly 7 = 1. For general huj the condition 7 = 1 is, however, 
incompatible with (4.14). 

Multi-force methods (2.13) offer a way out of these difficulties. For such meth¬ 
ods, the coefficients of the modulated Fourier expansion satisfy (4.12) with (4.13) 
replaced by 


' t (r\\ A !U \2 

a = =—t—77 ^-, P = 2 ^j(°)1 

3 


sine (hu) 


7 = 5>(0WM ■ 


(4.15) 


The two-force method (1.23) with (1.25) has a = p = 7 = 1 as desired. 


XIII.5 Modulated Fourier Expansions 

The decomposition of the exact and the numerical solution into modulated exponen¬ 
tials and a remainder, as derived in Sect. XIII.3, was found useful for understanding 
several important aspects of the numerical behaviour. Those few terms are, how¬ 
ever, not sufficient for explaining the long-time near-conservation of the total and 
the oscillatory energy. The expansion can be made more accurate by adding further 
terms e ±2lu;t , e ±3luJt etc. multiplied by slowly varying functions. This leads to an 
asymptotic expansion which we call the modulated Fourier expansion. This expan¬ 
sion is constructed in the present section, following Hairer & Lubich (2000a). (In 
that paper the modulated Fourier expansion was called the frequency expansion.) 

XIII.5.1 Expansion of the Exact Solution 

The following theorem extends the construction of Sect. XIII.3.1 to arbitrary order 

in a; -1 . 

Theorem 5.1. Consider a solution x(t) of (2.1) which satisfies the bounded-energy 
condition (2.3) and stays in a compact set K for 0 < t < T. Then, the solution 
admits an expansion 

x(t)=y(t)+ e ikult z k (t) + R N (t ) (5.1) 

0<|fc|<iV 

for arbitrary N > 2, where the remainder term and its derivative are bounded by 
R N (t) = 0 (lo~ n - 2 ) and R N (t) = 0 (lu~ n - 1 ) for 0 <t<T. (5.2) 
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The real-valued functions y = (yo,Vi) and the complex-valued functions z k = 
z k ) together with all their derivatives (up to arbitrary order M) are bounded 

yo = 0(1), z\ = O(co~ 3 ), z k = O(L 0 - k ~ 2 ) 

Vi = O(co- 2 ), z{ = Oico- 1 ), 

for k = 2,..., N — 1. Moreover, z~ k = z k for all k. These functions are unique 
up to terms of size 0 (uj~ n ~ 2 ). The constants symbolized by the O-notation are 
independent of uj and t with 0 <t<T (but they depend on N, T, on E of (2.3), on 
bounds of the derivatives of the nonlinearity g(x) on K, and on the maximum order 
M of considered derivatives). 

Proof We set 

*.(*)= !/(*)+ X ^^(t) (5- 4 ) 

0<\k\<N 

and determine the smooth functions y(t), z(t) = z 1 (t), and z 2 (t ),... ,z N ~ 1 (t) 
such that x* (t) inserted into the differential equation (2.1) has a small defect, of size 
0(uj~ N ). To this end we expand g(x* (t)) around y(t) and compare the coefficients 
of e lkuJt . With the notation g^ rn \y)z OL = g^ m \y)(z ai ,..., z arn ) for a multi-index 
a (aq,..., a m ), there results the following system of differential equations: 

+ X 9 {m \y)* a (5-5) 

' ml 

s(a)= 0 

= X 9 (m \y)z a (5.6) 

' ml 

s(a)=1 

= X f9 im \y)z a . (5.7) 

ml 

s(a)=k 

Here the sums range over all m > 1 and all multi-indices a = (oq,..., a m ) with 
integers ay satisfying 0 < \af < N, which have a given sum s («)#££*■■«*•• 
For large uo, the dominating terms in these differential equations are given by the 
left-most expressions. However, since the central terms involve higher derivatives, 
we are confronted with singular perturbation problems. We are interested in smooth 
functions y,z,z k that satisfy the system up to a defect of size 0(uj~ n ). In the spirit 
of Euler’s derivation of the Euler-Maclaurin summation formula (see e.g. Hairer & 
Wanner 1997) we remove the disturbing higher derivatives by using iteratively the 
differentiated equations (5.5)-(5.7). This leads to a system 

yo = r 0 (yo,y,z 1 .')• h = oj~ 1 J r 1 (yo,y,z 1 ,..,,z N ~ 1 ,uj~ 1 ) 

z 0 =uj- 2 g 0 (y 0 ,y,z 1 ,...,z N - 1 ,uj- 1 ), yi = u> _2 </i(y 0 ,2 /,z\ • • • ,z N ~ 1 ,io~ 1 ) 
Zq =u- 2 gfi(yo,y,z 1 ,...,z N ~ 1 ,uj- 1 ), z\ = oj~ 2 g^(y 0 ,y, z\... ,z n ~ 1 ,lo~ 1 ) 


Vo 

u 2 Vi 


= g(y) 


—OJ 2 Zq 

2iujzi 


-k 2 u 2 z% 
, 2 \, .2 


(1 - k 2 )i0 2 Z 


f / 2 iuizo + z 0 

V 

( 2kiu>ZQ + Zq 

+ ( + z\ 
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where Tj , Gj , Gj are formal series in powers of a; 1 . Since we get formal algebraic 
relations for yi y ZQ,z k , we can further eliminate these variables in the functions 
Tj , Gj , Gj - We finally obtain for yi,zi, z k the algebraic relations 

zo = co~ 2 (Goo(yo, yo, zi) + cj _1 Goi(^o, zi) + • • •) 

Vi = (Gio(yo,yo, z\) + Ld~ 1 Gii(yo,yo, z\) + ...) 
z o = CJ_2 (^oo(^o,^o,^i) +^ _1 Gqi (yo,yo, z\) + ...) 

^1 = u -2 (G k 0 (yo,yo, zi) + cj~ 1 G k 1 (yo,yo, zi) + ...) 

and a system of real second-order differential equations for t/o and complex first- 
order differential equations for zi : 

Vo = -^ 00 ( 2 / 0 ? yo? ^i) + ^ 1 ) + • • • 

^1 = ^ -1 (-^ 10 ( 2 / 0 ? 2 / 0 ? ^ 1 ) + yo, z i) + • • •) • 

At this point we can forget the above derivation and take it as a motivation for the 
ansatz (5.8)-(5.9), which is truncated after the 0(uj~ n ) terms. We insert this ansatz 
and its first and second derivatives into (5.5)-(5.7) and compare powers of a; -1 . This 
yields recurrence relations for the functions n , G k t , which in addition show that 
these functions together with their derivatives are all bounded on compact sets. 

We determine initial values for (5.9) such that the function x* ( t ) of (5.4) satisfies 
£*(0) = xq and x*(0) = x$. Because of the special ansatz (5.8)-(5.9), this gives a 
system which, by fixed-point iteration, yields (locally) unique initial values yo(0), 
2/o(0), 2 : 1 ( 0 ) satisfying (3.5). The assumption (2.3) implies that 2^(0) = O(o; -1 ). It 
further follows from the boundedness of Fu that z\(t) = 0(cj - 1 ) for 0 < t < T. 
Going back to (5.7), it is seen that the functions G k x contain at least k times the 
factor 2 : 1 . This implies the stated bounds for all other functions. 

It remains to estimate the error i?Ar(t) = x(t) — x*(t). For this we consider the 
solution of (5.8)-(5.9) with the above initial values. By construction, these functions 
satisfy the system (5.5)-(5.7) up to a defect of 0(uj~ n ). This gives a defect of size 
0{uj~ n ) when the function x*(t) of (5.4) is inserted into (2.1). On a finite time 
interval 0 < t < T, this implies i?jv(t) = 0(uj~ n ) and 77/v(£) = 0(uj~ n ). To 
obtain the slightly sharper bounds (5.2), we apply the above proof with N replaced 
by N + 2 and use the bounds (5.3) for z N and z N+1 . □ 

XIII.5.2 Expansion of the Numerical Solution 

Does the numerical solution of (2.1) have a modulated Fourier expansion similar 
to the analytical solution? This may of course be expected, but in Sect. XIII.3.2 
we encountered difficulties in constructing the first terms of the expansion in the 
situation of a numerical resonance where huo is close to an integral multiple of 7 r. 
We therefore confine the discussion to the non-resonant case. We assume that h and 
l j - 1 lie in a subregion of the (ft, a; -1 )-plane of small parameters for which there 
exists a positive constant c such that 
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| sm(^khu)\ > cVh for & = 1,.... TV, with N >2. (5.10) 

This condition implies that huo is outside an 0(y/h) neighbourhood of integral mul¬ 
tiples of 7r. For given h and uo, the condition imposes a restriction on N. In the 
following, N is a fixed integer such that (5.10) holds. There is the following numer¬ 
ical analogue of Theorem 5.1. 

Theorem 5.2. Consider the numerical solution of the system (2.1)-(2.3)by method 
(2.6) with step size h. Let the starting value x\ he given hy (2.7) with n = 0. Assume 
hu > cq > 0, the non-resonance condition (5.10), and the bounds (4.1) for f>(huo) 
and (j)(huf). Then, the numerical solution admits an expansion 

Xn = Uh(t)+ e lkoJt z^(t) + Rh,N(t) (5.11) 

0<|fc|<iV 

uniformly for 0 < t = nh < T. The remainder term is of the form 

Rh,N(t) = t 2 h N \Pr(t) with r(t) = 0(f)(hu) N + h 171 ) , (5.12) 

where m > 0 can be chosen arbitrarily. The coefficient functions together with all 
their derivatives (up to some arbitrarily fixed order) are bounded by 

y h , 0 = O(l), 4 iO = 0(^- 2 ), z k hfi = 0(u- k ), 

Vh, l = 0(u>~ 2 ), z\ x = = 0{u>~ k ) 

for k = 2,..., N — 1. Moreover, zf k = z\ for all k. The constants symbolized by 
the O-notation are independent ofuo and h with (5.10), but they depend on E, N, 
m, c, and T. 

The proof covers the remainder of this subsection. It constructs a function 

Xh(t)=yh(t)+ elkut z h(t) ( 5 - 14 ) 

0<|fc|<jV 

with smooth coefficient functions ynif) and z%(t) 9 which has a small defect when 
it is inserted into the numerical scheme (2.6). The following functional calculus is 
convenient for determining the coefficient functions. 

Functional Calculus. Let / be an entire complex function bounded by \f(()\ < 
C e 7 ^l. Then, 

f(hD)x(t) = jfEM h k x (k) {t) 

k=0 

converges for every function x which is analytic in a disk of radius r > yh around t. 
If /i and /2 are two such entire functions, then 


fi{hD)f 2 {hD)x(t ) = (f 1 f 2 )(hD)x(t) 
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whenever both sides exist. We note ( hD) k x(t ) = h k x^ ( t ) for k = 0,1, 2,... and 
exp (hD)x(t) = x(t + ft). 

We next consider the application of such an operator to functions of the form 
e luJt z(t). By Leibniz’ rule of calculus we have (hD) k e luJt z(t) = e luJt (hD + 
ihuj) k z(t). After a short calculation this yields 

/ (hD)e lujt z(t) = e lujt f(hD + ihuj)z(t) (5.15) 

where f{hD + ihcj)z(t) = f^ k \ihuo)/k\ • h k z^ k \t). 

An 7V-times continuously differentiable function x is replaced by its Taylor 
polynomial of degree TV — 1 at t, and f(hD)x(t ) is then considered up to 0{h N ). 

Modified Equations for the Coefficient Functions. The difference operator of the 
numerical method becomes in this notation 

x(t + h) — 2 cos hfl x(t) + x(t — h) = ( e hD — 2 cos fti? + e~ hD )x(t). 


We factorize this operator as 

C(hD) := e hD — 2cos hft + e~ hD = 2(cos(iftD) — cos hfi) 

V ' (5.16) 

= 4 sin(|fti? + \ihD^j sin(^fti? — ^iftD) . 

The function Xh(t) of (5.14) should formally (up to 0(ft Ar+2 )) satisfy the difference 
scheme 

C(hD)x h (t) = h 2 tyg($x h (t)). (5.17) 

We insert the ansatz (5.14), expand the right-hand side into a Taylor series around 
<Pyh(t), and compare the coefficients of e lkujt . This yields the following formal 
equations for the functions yn(t) and z k (t): 


C{hD)y h = h 2 *(g($y h )+ ^ ^ W (%)(fe t )j 

s(a)=0 

C(hD + ikluo)z k h = h 2 & ^ g (m) {$y h ){$z h ) a . 

s(ot)=k 


(5.18) 


Here, a = (aq,..., a m ) is a multi-index as in the proof of Theorem 5.1, s(a) = 
J2jLi a j > and (<&z) a is an abbreviation for the ra-tupel (<3>z ai ,..., <&z arn ). To get 
smooth functions yh(t) and z k (t) which solve (5.18) up to a small defect, we look 
at the dominating terms in the Taylor expansions of C(hD) and C(hD + ikhuj). 
With the abbreviations Sk = sin (^khuj) and c& = cos(|ftfto;) we obtain 


£(ftX>) 


0 0 
0 4sl 

—4 s\ 

0 



(j ;)««» 


£(ftZ> + ihuS) — 


+ 282 
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-c 2 (j j) (ihD) 2 + ... (5.19) 

C(hD + ikhw) = (“J 4 _4 Sfc ° iSfe+1 )+ 2s ^(j \)^ ihD ) 

c 2fc ( q ^{ihD? + .... 

Construction of the Coefficient Functions. Under the non-resonance condition 
(5.10), the first non-vanishing coefficients in (5.19) are the dominant ones, and the 
derivation of the defining relations for y^ and z\ is the same as for the analytical 
solution in Theorem 5.1; see also part (b) of the proof of Theorem 4.1. We insert 
(5.19) into (5.18) and we eliminate recursively the higher derivatives. This motivates 
the following ansatz for the computation of the functions y^ and z k : 

jjh ,0 = /oo(') + /oi(-) + h /o 2 (') + • • • 

4.1 = ^^(/lo(-) + Vhf n (•) + •••) 

4,0 = ^2 (soo(-) + v^5oi(0 + •••) 

Vh,i = ^ff 10 (~) + Vhgn(-) + ■ ■ (5.20) 

4,0 = ~2 (ffoo(') + 4ft <7oi(‘) + • • •) 

4.1 = + + ■■■), 

Sk-hl^k-l v 7 

for k = 2 ,. — 1 , where the functions depend smoothly on the variables 

yh,o, Vh, o> ! and on the bounded parameters Vh/s^, Sk, Ck, 'ip(hcj) and 

(/ia;) -1 . Inserting this ansatz and its derivatives into (5.18) and comparing powers 
of Vh yields recurrence relations for the functions /j), g k t . The functions g k t (for 
k > 1 ) contain at least k times the factor c/)(hu)zj l 1? and fu contains this factor at 
least once. Since the series in (5.20) need not converge, we truncate them after the 

yN+m+2 t erms . 

Initial Values. The conditions Xh( 0) — xo and Xh(h) = x\ determine the initial 
values yh, o(0), yh,o(0) and 27 ^ 1 ( 0 ) in the same way as in Sect. XIII.3.2. Condition 
(4.1) yields again (4.6), and (2.3) then implies Zh, i(0) = d(o; _1 ). 

Defect. It follows from (4.1) that hi^(huj)(j)(huj) / S 2 = O(o; _1 ), so that z\ x — 
0(u;~ 1 zj l l ) by (5.20). This implies z\ ± {t) = O(o; _1 ) for t < T. The other esti¬ 
mates (5.13) are directly obtained from (5.20), which indeed yields the following 
more refined bounds for the coefficient functions together with their derivatives: 
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UhSi = 0(1), Hh,l = 0 (lo 2 ) 

4,o = O(oj~ 3 /Vh), 4, i = Oiuj- 1 ), 4,1 = 0(u~ 2 ) (5.21) 

4, 0 = <D(h<l)(hu>) k u>- k ), 4,i = 0(hTp(hu>)(l)(hu) k u- k ). 

Consequently, the values Xh{nh) inserted into the numerical scheme (2.6) yield a 
defect of size 0(h N+2 )\ 

Xh{t + h) — 2 cos (hft) Xh(t) + Xh(t — h) = 

= h 2 *(g($x h (t)) + O(4>(hL0) N L0~ N + /i JV+m )). °' 2z ' 

Standard convergence estimates then show that, on bounded time intervals, x n — 
Xh(nh ) is of size 0(t 2 h N ) and actually satisfies the finer estimate (5.12). This com¬ 
pletes the proof of Theorem 5.2. □ 


XIII.5.3 Expansion of the Velocity Approximation 

A similar expansion holds also for the velocities. We show this for the scheme (2.11) 
or its equivalent one-step formulation (2.8) with (2.9). 

Theorem 5.3. Under the assumptions of Theorem 5.2, the velocity approximation 
x n given by (2.11) has an expansion 

X n = V h (t) + e lkujt w^(t) + 

0<\k\<N 

uniformly for 0 < t = nh < T, where the real-valued functions Vh = (vh,o,Vh,i) 
and the complex-valued functions 0 , 1 ) together with their derivatives 

up to arbitrary order satisfy 

v h ,0 = Vh,o + 0(h 2 ), < 4,0 = Oiu- 1 ), w k hfi = 0(u~ k ) 

w h,i = iujz h,i + ^(uj- 1 ), v h ,i = Oioj- 1 ), w% tl = 0(u~ k ) 

for k = 2,..., N — 1. Moreover, wf 1 ' = w k . The constants symbolized by the 
O-notation are independent of uo and h with (5.10), but depend on E, N, c, and T. 

Proof. Let Uh(t) be defined by the continuous analogue of (2.11), 

2hsmc(hf2)uh(t) = Xh(t + h) — Xh(t — h). (5.24) 

Theorem 5.2 then yields that 

X n = U h (t) + 0(t 2 h N ~ l ) 

for t = nh on bounded time intervals. Here we used that the remainder term in the 
lower component of (5.12) is of the form O(f>(h(jo)((j)(hu;) + h)t 2 h N ), so that its 
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quotient with 2 h sine (huj) becomes 0(t 2 h N x ) by the third of the conditions (4.1) 
and by (5.10). The function Uh(t) can be written as 

Uh(t) = v h (t) + e lkut w k (t) . (5.25) 

0<|fe|<iV 

We insert the relation (5.14) into —ism(ihD)xh(t) = hsmc(hf2)uh(t), which is 
equivalent to (5.24), and compare the coefficients of e lkujt to obtain 

smc(ihD)y hi0 = Vh,o 

sine (ihD) yn i = sine (huj)vh i 

1 ’ 7 ’ 7 (5.26) 

(ih) 1 sin (ihD — khuo) z% 0 = w k 0 

(ih ) _1 sin (ihD — khuj) z k x = sine (huj)w 1 ^ x 

for k = 1,..., TV — 1. In particular, for x we get 

i = iuj cos (ihD) — iuo sin (ihD) (5.27) 

With the above equations, the estimates now follow with the bounds (5.21) of the 
coefficient functions and their derivatives, using again (4.1). □ 


XIII.6 Almost-Invariants of the Modulated Fourier 
Expansions 

The system for the coefficients of the modulated Fourier expansion of the exact 
solution is shown to have two formal invariants, which are related to the total and the 
oscillatory energy. In particular, this explains the near-conservation of the oscillatory 
energy over very long times. Analogous almost-invariants are shown to exist also for 
the modulated Fourier expansion of the numerical solution. This forms the basis for 
results on the long-time energy conservation of numerical methods, which will be 
given in Sections XIII.7 and XIII.8. 


XIII.6.1 The Hamiltonian of the Modulated Fourier Expansion 

The equation (2.1) is a Hamiltonian system with the Hamiltonian 

H(x,x) = \x T x + \x t Q 2 x + U{x) . (6.1) 

In the modulated Fourier expansion of the solution x(t) of (2.1), denote y°(t) = 
y(t) and y k (t ) = e lkujt z k (t) (0 < \k\ < N), and let 

y = (y~ N+ 1 ,---,y~ 1 ,y°,y 1 ,---,y N ~ 1 ) • 
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By (5.5)-(5.7) these functions satisfy 

y k + Q 2 y k = - X — ] U (m+l) {y°)y a + 0 (lo~ n ) . (6.2) 

s(oi)=k 

Here, the sum is over all m > 0 and all multi-indices a = (cki, ..., a m ) with 
integers ctj (0 < \afi < N) which have a given sum s(a) = Y^jLi a j> an d we 
write y a = ( y ai ,..., y arn ). We define 

W(y) = U(y°)+ V (6.3) 

ml 

s(a )=0 

From the above it follows that y (t) satisfies the system 

f + n 2 y k = - \7y- k U{ y) + 0{w~ N ) (6.4) 

which, neglecting the 0(co~ N ) term, is the Hamiltonian system (cf. Exercise 6) 

■k 9U . ., ,. k dU . ., 

V = ftp (y ’ y) ’ y = “a^ (y ’ y) (6 ‘ 5) 

with 

W(y,y) = i ((^ _/c ) T ^ + (y~ k ) T ^ 2 y k ) +W(y) . (6.6) 

|fe|<iv 

Theorem 6.1. Under the assumptions of Theorem 5.1 , z7t£ Hamiltonian of the mod¬ 
ulated Fourier expansion satisfies 

W(y (t),y(t)) = H(y(0),y(0)) + O(oj- N ) (6.7) 

W(y(t),y(i)) = H(x(t),x(t)) + O(oj~ 1 ) . (6.8) 


77z£ constants symbolized by O are independent of uj and t with 0 < t < T, but 
depend on E, N and T. 


Proof Multiplying (6.4) with (y k ) T and summing up gives 


X ( y~ k ) T (y k + ^V) 

\k\<N 


±U(y) + 0(u J - N ). 


(6.9) 


Integrating from 0 to t and using y k = y k then yields (6.7). 

By the bounds of Theorem 5.1, we have for 0 < t < T 

«(y,y) = ill^ll 2 + \\m 2 + " 2 \\vl\\ 2 + u( y °) + o^- 1 ). ( 6 . 10 ) 


On the other hand, we have from (6.1) and (5.1) 

H(x,x) = 1113/81| 2 +1 ||yi||yi H-y^ll 2 +^(y 0 )+OC^- 1 ). ( 6 . 11 ) 


Using y\ j= e l ^ l z\ and y\ = e' lujt (z\ + iuz\) together with y 1 1 = y\, it follows 
from ij = 0{LO~ l )\h&\.y\+yp = iu(y\ - yp) + 0(w _1 ) and ||y||| =o;||yJ|| + 
0((J- 1 ). Inserted into (6.10) and (6.11), this yields (6.8). □ 
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XIII.6.2 A Formal Invariant Close to the Oscillatory Energy 

In addition to the Hamiltonian Tt(y, y), the system for the coefficients of the mod¬ 
ulated Fourier expansion has another formally conserved quantity. This almost- 


invariant depends only on the oscillating part and is given by 

l(y,y) = -ico k (y~ k ) T y k - (6-12) 

0<\k\<N 

This turns out to be close to the energy of the harmonic oscillator, 

I(x,x) = 5 11*1 II 2 + \lu 2 \\xi \\ 2 . (6.13) 

Theorem 6.2. Under the assumptions of Theorem 5.1, 

I(y(t),y(t)) = 2T(y(0),y(0 )) + 0(u~ N ) (6.14) 

z(y(t),y(t)) = i(x(t),x(t)) + e>(w _1 ). (6.15) 


The constants symbolized by O are independent of uo and t with 0 < t < T, but 
depend on E, N and T. 

Proof For the vector 

y(A) = (e < (- jV+1 ) A j/- jV+ i,. .., e~ iX y-\y°, e iX y\ji N ^ y N - 1) 

the definition (6.3) of U shows that U{ y(A)) does not depend on A. Its derivative 
with respect to A thus yields 

0 = fu(y(\)) = X ike ikX (y k ) T V k U( y(A)) , 

0<\k\<N 

and putting A = 0 we obtain 

ik(y k ) T V k U( y) = 0 (6.16) 

0<|fe|<JV 

for all vectors y = ( y ~ N+1 ,..., 2/ -1 , y°, V 1 ,---, 

The proof of Theorem 6.2 is now very similar to that of Theorem 6.1. We mul¬ 
tiply the relation (6.4) with —iujk(y~ k ) T instead of (y~ k ) T . Summing up yields, 
with the use of (6.16), 

-iu k{y- k ) T {y k + Q 2 y k ) = 0{u~ N ) . (6.17) 

0<\k\<N 

The time derivative of X(y, y) of (6.12) equals 

^2T(y,y) = -iu ^ k {iy~ k ) T V k + (y~ k ) T y k ) • (6-18) 

0<\k\<N 
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In the sums ^ k k{y k ) T f2 2 y k and ^ k k ) T V k > the terms with k and —k can¬ 

cel. Hence, (6.17) and (6.18) together yield 

|j(y,y) = 0(u~ N ) , 


which implies (6.14). 

With y k = e lkujt (z k + ikuz k ) = ikuy k + 0(o; _1 ), it follows from the bounds 
of Theorem 5.1 that 

Z(y,y) = 2^\\y\\\ 2 + 0{u:- 1 ). 

On the other hand, using the arguments of the proof of Theorem 6.1, we have 

I(x,x) = III y\ + ypf + XII y\ + yp\\ 2 + O^ 1 ) = 2co 2 \\yl\\ 2 + 

This proves the second statement of the theorem. □ 

Theorem 6.2 implies that the oscillatory energy is nearly conserved over long 
times: 

Theorem 6.3. If the solution x(t ) of (2.1) stays in a compact set for 0 < t < uj n , 
then 

I(x(t),x(t)) = /(x(0),x(0)) + O(o; _1 ) + 0(tuj~ N ) . 

The constants symbolized by O are independent of uj and t with 0 < t < uo N , but 
depend on E and N. 

Proof With a fixed T > 0, let y j denote the vector of the modulated Fourier ex¬ 
pansion terms that correspond to starting values (. x(jT),x(jT )). For t = (n + 0)T 
with 0 < 6 < 1, we have by (6.15) 

I(x(t),x(t)) — /(x(0), x(0)) 

= T(y n (0T),y n (0T)) + - T(y 0 (0),y 0 (0)) + 

= l(y n (0T),y n (0T)) - J(y„(0),y„(0)) + 

^(x(y i+ i(0),y i+1 (0)) -X^O),^ 0 ))) +0(u>~ 1 ). 
j =0 

We note 

~ J (yi(°)>yj(°)) = o(u~ N ) , 

because, by the quasi-uniqueness of the coefficient functions as stated by Theo- 
rem 5.1, we have y j+1 (0) = y j(T) + 0(co~ N ) and y i+1 (0) = y^(T) + 0{uj~ n ), 
and we have the bound (6.14) of Theorem 6.2. The same argument applies to 
l(y n (eT),y n (0T)) - J(y n (0), y„(0)). This yields the result. □ 
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In a different approach, Benettin, Galgani & Giorgilli (1987) use a sequence 
of coordinate transformations from Hamiltonian perturbation theory to show that I 
has only small deviations over time intervals which grow exponentially with cj, in 
the case of an analytic potential U. By carefully tracing the dependence on N of 
the constants in the 0{u~ N ) -terms, near-conservation of I over exponentially long 
time intervals can be shown also within the present framework of modulated Fourier 
expansions; see Cohen, Hairer & Lubich (2003). 


XIII.6.3 Almost-Invariants of the Numerical Method 


We show that the coefficients of the modulated Fourier expansion of the numerical 
solution have almost-invariants that are obtained similarly to the above. We denote 


= ( v h N+ 7 

*/. = (*h n+ \ 


- -, 2/ft 3/fe, - - - ,3/^ 7 

• • 5 Z h ’ Z h 5 Z h 5 ' ' ' 5 Z h ) 


with y°(t) = z°(t) = yh(t) and y k (t) = e lkujt z k (t), where yh and z% are the 

coefficients of the modulated Fourier expansion of Theorem 5.2. Similar to (6.3) we 
consider the function 


U h (y h ) = U($y° h )+ £ ^ W (^)(%)“, (6.19) 

s(ct)=0 

where the sum is again taken over all m > 1 and all multi-indices a = (aq,..., a m ) 
with 0 < |ay | < TV for which s(a) = a j = 0- It then follows from (5.22), 
multiplied with that the functions y k (t) satisfy 

V- 1 * h~ 2 C(hD ) y k = - V_ fc U h (y h ) + 0(h N ), (6.20) 


where C{hD) of (5.16) denotes again the difference operator of the numerical 
method. The similarity of these relations to (6.4) allows us to obtain almost- 
conserved quantities that are analogues of TL and X above. 


The First Almost-Invariant. We multiply (6.20) by ( y h k ) T , and as in (6.9) we 
obtain 


\k\<N 


-k' 

h 


Y^-^h- 2 C{hD)y k h + jU h {y h ) 


0(h N ). 


Since we know bounds of the coefficient functions z k and of their derivatives from 
Theorem 5.2, we switch to the quantities z l j] and we get the equivalent relation 


J2 (z^ k -ikuz^ k ) T &- l $h- 2 £(hD+ikuh)z£+j t U h (z h ) = 0(h N ). (6.21) 


|fc|<JV 


We shall show that the left-hand side is the total derivative of an expression that 
depends only on and derivatives thereof. Consider first the term for = 0. The 
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symmetry of the numerical method enters at this very point in the way that the 
expression C(hD)y = h 2 y + c^h A y^ + CQh 6 y^ + ... contains only terms with 
derivatives of an even order. Multiplied with y T , even-order derivatives of y give a 
total derivative: 

(iW’)- 

Thanks to the symmetry of the difference operator C(hD) only expressions of this 
type appear in the term for k = 0 in (6.21), with z® in the role of y. Similarly, we 

get for z = z k and z = zff k with 0 < \k\ < N 

R = Re | (f - ... T (z^fz^ ± 1 (zW) T z 0) ) 

Rez T z< 2l +» =Rcj f (V'.-. (2, > - ... ± {z«-^) T z {l+1) T \ (z^fz^) 

=lmj f (fzW - f z^ + ... T (z^fz^f 

lmz T z^ =lmj f (z T z^ - fz^ + ... ± (z^) T z (l+1) ) ■ 

Using the formulas (5.19) for C(hD + ikhu), it is seen that the term for k in (6.21) 
has an asymptotic ^-expansion with expressions of the above type as coefficients. 
The left-hand side of (6.21) can therefore be written as the time derivative of a 
function Hh[ z h](t) which depends on the values at t of the coefficient function 
vector Zh and its first N time derivatives. The relation (6.21) thus becomes 

^H h [z h ](t) = 0(h N ) . 

Together with the estimates of Theorem 5.2, this construction of Tin yields the fol¬ 
lowing result. 

Lemma 6.4. Under the assumptions of Theorem 5.2, the coefficient functions z h = 
(z ^ N+1 ,..., z^,yh, z\,..* | of the modulated Fourier expansion of the nu¬ 

merical solution satisfy 


y Ty(2l) = j t \y~y 


[y 


T,X2l-l) 


.^V 2i - 2) + ...T(/- 1 ) )V +1) ±^ 


H h [z h \(t) = H h [ z h ](0) + <D{th N ) (6.22) 


for 0 <t<T. Moreover, 

tih[zh](t) = l\\y hfi (t)\\ 2 + a(huj)2uj 2 \\z\ A (t)\\ 2 + U($y h (t)) + 0(h 2 ), (6.23) 
where = ^inc{huc)f{huo)/f)(huo). □ 
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The Second Almost-Invariant. By the same calculation as in the proof of Theo¬ 
rem 6.2 we obtain for Uh(y h (t)) of (6.19) 

0= X iku(y k h ) T V k U h (y h ). 

0<\k\<N 

It then follows from (6.20) that 

-iw ^ k{y^ k ) T *- l $h- 2 C(hD)y k h = 0(h N ) . 

0<\k\<N 

Written in the z variables, this becomes 

-itu X Kz^ k ) T &- 1 $h-‘ 2 C(hD + ikuh)z k h = 0{h N ) . (6.24) 

0<\k\<N 

As in (6.21), the left-hand expression can be written as the time derivative of a 
function X/ l [z/ l ](t) which depends on the values at t of the function z h and its first 
N derivatives: 

pM(t) = 0(h N ) . 

Together with the estimates of Theorem 5.2 this yields the following result. 

Lemma 6.5. Under the assumptions of Theorem 5.2, the coefficient functions z^ of 
the modulated Fourier expansion of the numerical solution satisfy 

lh[z h ]{t) = X h [z h ](0) + 0(th N ) (6.25) 

for 0 < t < T. Moreover, 

Zh[zh](t) = cr(hu) 2u> 2 ||4 i i(t)|| 2 + 0{h 2 ) , (6.26) 

where again a(hcc) = ^\nc{huj)f{huc)/^{hijo). □ 

Symplectic methods have -0(£) = sinc(^) and hence crfhoo) = 1. To be 
able to also treat methods where a(huS) can be small, we need to sharpen the esti¬ 
mates of Lemma 6.5. Close scrutiny of the equations (5.20) that determine the co¬ 
efficient functions of the modulated Fourier expansion, shows that the 0(h 2 ) term 
in (6.26) contains a factor and that the 0(th N ) term in (6.25) can be put in 

the form 0{t(j){huo) N h N ) + 0(th NJrrn ) with an arbitrary integer m > 0; cf. (5.12). 
Assume now that 

f is analytic with no real zeros other than integral multiples of 7r. (6.27) 

This condition ensures that \ f{huo)\ 2 > ch m for some m if hu; satisfies (5.10). 
Under the conditions of Theorem 5.2, in particular, (4.1) and (5.10), the improved 
bounds of the remainder terms yield the following estimates for Xh = lh/&{huj)\ 
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l h [z h }{t) = l h [z h ](0) + 0(th N ) (6.28) 

l h [z h ](t) = 2u 2 ||4 il (t)|| 2 + 0(h 2 ) . (6.29) 

Relationship with the Total and the Oscillatory Energy. The almost-invariants 

l h = 7T T Ifc , Hh = n h - (l - 7T T (6.30) 

cr(riuj) V a(tiuj)/ 

of the coefficient functions of the modulated Fourier expansion are then close to the 
total energy H and the oscillatory energy / along the numerical solution (x n , x n ): 

Theorem 6.6. Under the conditions of Theorems 5.2 and condition (6.27), 

U h [z h ]{t) = n h [z h ](0) + 0{th N ) , X h [zh](t) = Th[zh}(0) + 0(th N ) 

= H{x n ,x n ) + 0{h ), l h [z h ]{t)= I(x n ,x n ) + 0{h) 

holds for 0 < t = nh < T. The constants symbolized by O depend on E, N and T. 

Proof. The upper two relations follow directly from (6.22) and (6.28). Theorems 5.2 
and 5.3 show 

WI »,1 = u}(e tut zl A (i) + e~ m,t z^ 1 (t))+0(h) 
x n ,i = iu}(e wt z\ x (t) - e~ lut z^(t)) + 0(h) . 

With the identity ||u + v\\ 2 + ||^ — v\\ 2 = 4||u|| 2 , this implies 

I(x n ,x n ) = 2u 2 \\zl A (t)\\ 2 + 0(h) . 

A comparison with (6.29) then gives the stated relation between I and T^. The 
relation between H and Hh is proved in the same way, using in addition (6.23). □ 


XIII.7 Long-Time Near-Conservation of Total 
and Oscillatory Energy 

With the results of the previous section, we can now show that the numerical method 
nearly preserves the total energy H and the oscillatory energy I over time intervals 
of length C/v/i _Ar+1 , for any N for which the non-resonance condition (5.10) is 
satisfied. Such a result is due to Hairer & Lubich (2000a). 

For convenience we restate the assumptions: 

• the energy bound (2.3): ^||x(0)|| 2 + -||| j 7 x (0)|| 2 < E ; 

• the condition on the numerical solution: the values <T>x n stay in a compact 
subset of a domain on which the potential U is smooth; 



XIII.7 Long-Time Near-Conservation of Total and Oscillatory Energy 511 


• the conditions on the filter functions: and 0 are even, real-analytic, and have 
no real zeros other than integral multiples of 7r; they satisfy 0(0) = 0(0) = 1 
and (4.1): 

\ip(h(jj)\ < Ci sinc 2 (^cj) , \fi(hu)\ < C 2 |sinc(|/iu;)| , 
\f>(huj)(j)(huj)\ < C 3 |sine(/ io;)| ; 


• the condition hej > Co > 0 ; 

• the non-resonance condition (5.10): for some TV > 2 , 

| sm(^khu>)\ > cVh for fc = l,...,iV. 

Theorem 7.1. Under the above conditions , numerical solution of (2.1) obtained 

by the method (2.7)-(2.8) with (2.9) satisfies 


H ( x n , 


%n) 

*^n) 


iT(x 0 ,Xo) + <9(ft) 
/(x 0 ,x 0 ) + 0 (A) 


/or 0 <nh < h N+1 . 


77z£ constants symbolized by O are independent of n, h, uj satisfying the above 
conditions, but depend on N and the constants in the conditions. 


Proof. The estimates of Theorem 6.6 hold uniformly over bounded intervals. We 
now apply those estimates repeatedly on intervals of length h , for modulated Fourier 
expansions corresponding to different starting values. As long as (x n , x n ) satisfies 
the bounded-energy condition (2.3) (possibly with a larger constant E ), Theorem 5.2 
gives a modulated Fourier expansion that corresponds to starting values (x n ,x n ). 
We denote the vector of coefficient functions of this expansion by z n (t): 

z = (z~ N+1 z- 1 v z 1 z N - x ) 

L n \^n 5 • • • 5 • • • 5 ^ n ) 


(omitting the notational dependence on h for simplicity). Because of the uniqueness, 
up to 0(h N+1 ), of the coefficient functions of the modulated Fourier expansion con¬ 
structed by (5.20), the following diagram commutes up to terms of size 0(h N+1 )\ 


(Xn-, Xn) 


numerical 

method 


(^n+l5 ^n+l) 


(z n ( 0 ),z n ( 0 )) 

flow 

(z n (h) , z n (h )) 

= (up to 0(h NJr1 )) 

(^n+l(0)? ^n+l(0)) 


The construction of the coefficient functions via (5.20) shows that also higher deriv¬ 
atives of z n at h and z n+ i at 0 differ by only 0(h N+1 ). From the above diagram 
and Theorem 6.6 we thus obtain 
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? 4 [z n+ i]( 0 ) = Hh[zn](h) + 0{h N+1 ) 

= n h [z n m + o(h N+1 ). 

Repeated use of this relation gives 

H h [z n m = H h [ z 0 ](0) + 0(nh N+1 ) . 

Moreover, by Theorem 6.6 the coefficient functions corresponding to the starting 
values (x n ,x n ) and (xo,^o) satisfy 

H h [z„](0) = H(x n ,x n ) + 0{h ), 

Hh[zo}(0) = H(x 0 ,x 0 ) + 0(h) . 

So we obtain 

H(x n ,x n ) - H(x 0 ,x 0 ) = Hh[zn](0) - Hh[zo}(0) + 0{h) 

= 0{nh N+l ) + 0{h) , 

which gives the desired bound for the deviation of the total energy along the numer¬ 
ical solution. The same argument applies to /(x n , x n ). □ 

The imposed bounds of and 0 become important when huj is close to an 
integral multiple of it. Are these conditions also sufficient to guarantee favourable 
energy behaviour uniformly in huj , arbitrarily close to multiples of 7 r? Unfortunately 
the answer is negative (see Fig. 2.5 to Fig. 2.7). The analysis of method (2.7)-(2.9) 
for exact resonances hu; = mir with integer m shows that stronger conditions 

\^{huo)\ < C |sinc(/itu)|, \^{huj)(j){huj)\ < Csm.c 2 (huo) (7.2) 

are required. Even this is not sufficient for near-conservation of the total and the 
oscillatory energy for hu near a multiple of it. For linear problems 

i+ (o ^) x = ~Ax 

with a two-dimensional symmetric matrix A with aoo > 0, and with initial values 
satisfying the bounded-energy condition (2.3), Hairer & Lubich (2000a) show that 
the numerical method conserves the total energy up to 0(h) uniformly for all times 
and for all values of hou, if and only if 

■ 0 (C) = sinc 2 (£)<?K£) • ( 7 - 3 ) 

There is no method (2.7)-(2.8) which approximately preserves the oscillatory energy 
I uniformly for all hu in a fixed open interval that contains a multiple of 2tt. 

In summary, the bad effect of step-size resonances on the energy behaviour of 
the method cannot be eliminated, but it can be considerably mitigated by an appro¬ 
priate choice of the filter functions ^ and 0 . 
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XIII.8 Energy Behaviour of the Stormer-Verlet 
Method 

The results of Sections XIII.5-XIII.7 provide new insight into the energy behav¬ 
iour of the classical Stormer-Verlet method. We present in this section weakened 
versions of results of Hairer & Lubich (2000b). 

In applications, the Stormer-Verlet method is typically used with step sizes h for 
which the product with the highest frequency uo is in the range of linear stability, but 
is bounded away from 0. For example, in spatially discretized wave equations, huo 
is known as the CFL number, which is typically kept near 1. Values of huo around \ 
are often used in molecular dynamics. In contrast, the backward error analysis of 
Chap. IX explains the long-time energy behaviour only for huo —> 0. 

Consider now applying the Stormer-Verlet method to the nonlinear model prob¬ 
lem (2.1)-(2.3), 


x n+ i - 2x n + x n -i = — h 2 Q 2 x n — h 2 VU(x n ) , ( 8 . 1 ) 

with huo < 2 for linear stability. The method is made accessible to the analysis 
of Sections XIII.3-XIII.7 by rewriting it as a trigonometric method (2.6) with a 
modified frequency: 

3?n+l 2 COs(hf2) Xji ~\~ X n —i — h VC (x^f) i (8.2) 

where 

n=(° °) 

(o wl) 

The velocity approximation 

_ ^n+l %n— 1 


does not correspond to the velocity approximation ( 2 . 11 ) of the trigonometric 
method, but this presents only a minor technical difficulty. We show that the fol¬ 
lowing modified energies are well conserved by the Stormer-Verlet method: 

H*(x,x) = H{x,x) + \ T ll*i II 2 1 

with 7 = -y-—— — 1 . (8.4) 

I*(x,x) = I(x,x) +7 ||±1 II 2 1-J ( huJ ) 

Here H and I are again the total and the oscillatory energy of the system (2.1) 
(defined with the original uo, not with 5). 

Theorem 8.1. Let the Stormer-Verlet method be applied to the problem (2.1)-(2.3) 
with a step size h for which 0 < Co < huo < c\ < 2 and \ sin(|fc/iS)| > c\[h 
for k = 1,..., N for some N > 2 and c > 0. Suppose further that the numerical 
solution values x n stay in a region on which all derivatives ofU are bounded. Then, 
the modified energies along the numerical solution satisfy 


with sin {\huo) = \huo . 


(8.3) 
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H*(x n ,x n ) = H*(x 0 ,x 0 ) + 0(h) 
I*{x n ,x n ) = I*(xo,xq) +0(h) 


for 0 <nh< h~ N+1 . 


(8.5) 


The constants symbolized by O are independent ofn, h, uo with the above conditions. 
Proof With the modified velocities x' n defined by 

2hsmc{hft)x r n = x n +i — x n -\ 


method (8.2) becomes a method (2.6) with (2.11), or equivalently (2.7)-(2.8), with 
io instead of uo and with = 1. 

The condition 0 < Co < huo < c\ < 2 implies \sm(^khuo)\ > C 2 > 0 for 
k = 1,2, and hence conditions (7.1) are trivially satisfied with Kuo instead of huo. 
We are thus in the position to apply Theorem 7.1, which yields 


H(x n ,x ' n ) = H(x 0 ,x' 0 ) + 0{h) 
T(x n , x' n ) = I(x 0 ,x' 0 ) ± 0(h) 


for 0 < nh < h~ N+1 , (8.6) 


where H and I are defined in the same way as H and /, but with uo in place of uo. 
The components of the Stormer-Verlet velocities x n and the modified velocities x' n 
are related by 


Xnfl = x'n, 0 , in, l = sine (hio) x' nl \h 2 u 2 x' nl , 

(jO v 


so that 


I(x n ,x' n ) 


= ^Kill 2 + ^ 2 K,ill 2 


1 


1 




uo 


o I {%m %n) • 


Similarly, 


H*(x n ,x n ) 


2 11 11 T~ U {Xjfj + I (x n ,x n ) 

H(x n ,x' n ) + (^2 “ t(x n ,x' n ) , 


(8.7) 


( 8 . 8 ) 


(8.9) 


and hence (8.6) yields the result. 


□ 


For fixed huo > Co > 0 and h —> 0, the maximum deviation in the energy does 
not tend to 0, due to the highly oscillatory term ^7||xi || 2 in H* (x, x) and I*(x, x). 
We show, however, that time averages of H and / are nearly preserved over long 
time. For an arbitrary fixed T > 0, consider the averages over intervals of length T, 
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Hn — rp h 'y . H( x n +j , Xn^j ) 


\jh\<T/2 


In rp h y I (-f'n+j i x n+j ) • 


( 8 . 10 ) 


\jh\<T/2 


Theorem 8.2. Under the conditions of Theorem 8.1, the time averages of the total 
and the oscillatory energy along the numerical solution satisfy 


for 0 < nh < h~ N+1 . 


( 8 . 11 ) 


H n = H o + 0(h) 

In = To + 0(h) 

The constants symbolized by O are independent ofn, h, u> with the above conditions. 


Proof. We show 

H n = H*(x n ,x n ) - ) — I*{x n ,x n ) + 0(h) 

fyfy < 812 » 

In = I*(x n ,x n ) -- —— l*(x n ,x n )+0(h) , 

^ 1+7 

which implies the result by Theorem 8.1. Consider the modulated Fourier expan¬ 
sions of x n and x' n for t — nh in a bounded interval. Theorem 5.3 shows that 

4,1 = - e~ lLJt Zp fyt)) + 0(h) , t = nh , 

with z\ x (t) from the modulated Fourier expansion of Theorem 5.2 (with Lo instead 
of u). With (8.7) it follows that 


x n ,i = iu\Jl- \hfui 2 (e“ 4 4,i(i) - e “‘4,iW) +0{h) , 

and therefore, recalling the definition of 7, 

II 2 = ( 2 H z mWII 2 “ 2Re e 2 “ t 4,iW 2 ) + °( h ) ■ 

Theorems 5.2 and 5.3 yield 

2w 2 114,1 w II 2 = l(x n ,x' n ) + 0(h) 

and hence, by (8.8), 

2w a ||4,i(*)f = l*(x n ,x n ) + 0(h). 

A partial summation shows that the time average over the highly oscillatory terms 
e 2lu)t u 2 z\ x (t) 2 is 0(h). This finally gives 

= Y^— I \x n ,x n )+0{h) . 

\j\<T/2 7 


Taking the time averages in the expressions of the definition (8.4) of H* and /* then 
yields (8.12). □ 
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Fig. 8.1. Total energies (left) and their predicted averages (right) for the Stormer-Verlet 
method and for two different initial values, with uj — 50 and h such that huj = 0.8 


Figure 8.1 illustrates the above result. It shows the total energy H for two dif¬ 
ferent initial values on the left, and the averages as predicted by the expression on 
the right-hand side of (8.12) on the right picture. The initial values are as in Chap. I 
with the exception of xi ? i(0) and xi ? i(0). We take xi ? i(0) = \[2/u, xi ; i(0) = 0 
for one set of initial values and xi ? i(0) = 0, xi,i(0) = y/2 for the other. The total 
energies at the initial values are 2.00240032 and 2, respectively. 


XIII.9 Systems with Several Constant Frequencies 

This section studies the conservation of invariants and almost-invariants along nu¬ 
merical approximations of an extension of (2.1) to systems with the Hamiltonian 
function 

H(p,q) = ±p T M- 1 p+^q T Aq + U(q) (9.1) 

with a positive definite constant matrix M and a positive semi-definite constant 
matrix A. With the Cholesky decomposition M = LL T and the canonical transfor¬ 
mation p = L~ x p, q = L t q we obtain a Hamiltonian where the mass matrix is the 
identity matrix and A is transformed to A = L~ 1 AL~ T . Diagonalizing A = QAQ T 
and transforming to x = Q T q then yields a Hamiltonian of the form (we omit the 
tilde on U(x) = U(q) and H(x , x) = H(p , q)) 

H(x,x) = i£(lNI 2 + § Nil 2 ) + U(x), (9.2) 

3=0 

where x = (xo, xi 9 ... , xt) with Xj E R dj , Xq = 0, and Xj > 0 for j > 1 are all 
distinct. After rescaling 6 we may assume Xj > 1 for j = 1,,.., L 

Following Cohen, Hairer & Lubich (2004) we extend the results of the previous 
sections to the multi-frequency case £ > 1. Modulated Fourier expansions are again 
the basic analytical tool. A new aspect is possible resonance among the Xj. 
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XIII.9.1 Oscillatory Energies and Resonances 

The equations of motion for the Hamiltonian system (9.2) can be written as the 
system of second-order differential equations 

x = — fi 2 x + g(x), (9.3) 

where Q = diag(c jjl) with the frequencies c oj = Xj/e and g(x) = —VU(x). As 
suitable numerical methods we consider again the class of trigonometric integrators 
studied in Sect. XIII.2, (2.6) with (2.11), with filter functions tp and <p>. 

We are interested in the long-time near-conservation of the total energy H(x,x) 
and the oscillatory energies 

1 A 2 

= 2 (ill'll 2 + INI 2 ) for j > 1 (9.4) 

or suitable linear combinations thereof. Benettin, Galgani & Giorgilli (1989) have 
shown that the quantities 


i 

= X ( ' J Ij(x, x) (9.5) 

, a 7 - 

3 = 1 

are approximately preserved along every bounded solution of the Hamiltonian sys¬ 
tem that has a total energy bounded independently of 5, on exponentially long time 
intervals of size 0(e c / £ ) if the potential U(x) is analytic and g = (/ii,..., gg) is 
orthogonal to the resonance module 

XA — | k G i feiAi ... T kg\g — 0}, (9.6) 

if a diophantine non-resonance condition holds outside M. (Cf. also Sect. XIII.9.4 
below.) 

Since g = A is orthogonal to M, the total oscillatory energy J2j=i x ) °f 
the system is approximately preserved independently of the resonance module M. 
Subtracting this expression from the total energy (1.7), we see that also the smooth 
energy 

K (x,x) = ^\\x 0 \\ 2 + U(x) (9.7) 

is approximately preserved. With an ^-independent bound of the total energy H(x,x) 
we have Xj = 0(e) for j = 1,... so that K(x, x) is close to the Hamiltonian 
of the reduced system in which all oscillatory degrees of freedom are taken out, 

H 0 (x 0 ,x 0 ) = \\\x 0 \\ 2 + U(x 0 , 0,... ,0). 

Example 9.1. To illustrate the conservation of the various energies, we consider a 
Hamiltonian (1.7) with t = 3, A = (1, \/2, 2) and we assume that the dimensions of 
Xj are all 1 with the exception of that of x\ = (xi,i,xi, 2 ) which is 2. The resonance 
module is then X4 = {(&i, 0, k %); k% A- 2 k% = 0}. We take £ -1 = uo = 70, the 
potential 
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Fig. 9.1. Oscillatory energies of the individual components (the frequencies A ju = Xj/e 
are indicated) and the sum I\ + I 3 of the oscillatory energies corresponding to the resonant 
frequencies u and 2 uj 


U(x) — (0.05 + + %2 + 2.5 X 3) 4 + —Xq x\ 1 + -Xq, (9.8) 

and x(0) = (1,0.3£, 0.8£, —l.le:, 0.7e:), ±(0) = (—0.2,0.6,0.7,—0.9,0.8) as ini¬ 
tial values. We consider for fi = (1, 0, 2) and fi = (0, 0), which are both 

orthogonal to M. In Fig. 9.1 we plot the oscillatory energies for the individual com¬ 
ponents of the system. The corresponding frequencies are attached to the curves. 
We also plot the sum I\ + J 3 of the three oscillatory energies corresponding to the 
resonant frequencies 1/e and 2/e. We see that I\ + I 3 as well as I 2 (which are 1^ 
for the above two vectors /i _L M) are well conserved over long times up to small 
oscillations of size O(e). There is an energy exchange between the two components 
corresponding to the same frequency 1/e , and on a larger scale an energy exchange 
between Ii and I 3 . 

Numerical Experiment. As a first method we take (2.6) with </>(£) = 1 and t/)(£) = 
sinc(£), and we apply it with large step sizes so that huj = h/e takes the values 1 , 2 , 
4, and 8 . Figure 9.2 shows the various oscillatory energies which can be compared to 
the exact values in Fig. 9.1. For all step sizes, the oscillatory energy corresponding to 
the frequency \[2uj and the sum Ii + Is are well conserved on long time intervals. 
Oscillations in these expressions increase with h. The energy exchange between 
resonant frequencies is close to that of the exact solution. We have not plotted the 
total energy H(x n ,x n ) nor the smooth energy K(x n ,x n ) of (9.7). Both are well 
conserved over long times. 

We repeat this experiment with the method where 0(£) = 1 and -0(£) = 
sinc 2 (£/2) (Fig. 9.3). Only the oscillatory energy corresponding to \/2c 0 is approx¬ 
imately conserved over long times. Neither the expression I\ + J 3 nor the total 
energy (not shown) are conserved. The smooth energy K(x n , x n ) is, however, well 
conserved. 

Figure 9.4 shows the corresponding result for the method with </>(£) = sinc(£) 
and -0(£) = sine(£)</>(£). The oscillatory energy for \[2u and also I\ + J 3 are well 
conserved. However, the energy exchange between the resonant frequencies is not 
correctly reproduced. 
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Fig. 9.2. Oscillatory energies as in Fig. 9.1 along the numerical solution of (2.6) with 0(£) = 
1 and ^(£) = sinc(£) 



Fig. 9.3. Oscillatory energies as in Fig. 9.1 along the numerical solution of (2.6) with 0(£) = 
1 and^>(£) = sinc 2 (£/2) 



Fig. 9.4. Oscillatory energies as in Fig. 9.1 along the numerical solution of (2.6) with 0(£) = 
sinc(£) and^(^) = sinc(£)0(£) 


XIII.9.2 Multi-Frequency Modulated Fourier Expansions 

The above numerical phenomena can be understood with a multi-frequency version 
of the modulated Fourier expansions studied in the previous chapter. We only outline 
the derivation and properties, since they are in large parts similar to the single¬ 
frequency case. More details can be found in Cohen, Hairer & Lubich (2004). We 
assume conditions that extend those of the previous sections: 

• The energy of the initial values is bounded independently of 5, 

i||x(0)|| 2 + l||^(0)|| 2 < J B. 


(9.9) 
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• The numerical solution values <T>x n stay in a compact subset of a domain on which 
the potential U is smooth. 

• We impose a lower bound on the step size: h/e > Co > 0. 

• We assume the numerical non-resonance condition 


sin^^- k • > cVh for all k G \ M with \k\ < TV, 

for some N >2 and c > 0. 

For the filter functions we assume that for Q = hXj/e (j = 1,..., \ 


|0fe)l < Ci sine 2 ( 5 Ci) , 

|0(Cj)l < C 2 |sinc(|0)l , 
10(0)1 < C 3 |sinc(0) 0(0)1 


(9.10) 


(9.11) 


The conditions on the filter functions are somewhat stronger than necessary, but they 
facilitate the presentation in the following. 

For a given vector A = (Ai,..., \i) and for the resonance module M defined 
by (9.6), we let 1C be a set of representatives of the equivalence classes in Z t / AA 
which are chosen such that for each k e 1C the sum \k\ = \k\ | +... +1 ki\ is minimal 
in the equivalence class [k] — k T- Ad, and with k £ /C, also —k G JC. We denote, 
for TV of (6.3), 


N = {k G JC : \k\ < N}, AT* = AT \ {( 0 ,..., 0 )}. (9.12) 


The following multi-frequency version of Theorem XIII.5.2 establishes a modulated 
Fourier expansion for the numerical solution. 

Theorem 9.2. Consider the numerical solution of the system (9.3) by the method 
(2.6) with step size h. Under conditions (9.9)-(9.11), the numerical solution admits 
an expansion 


X n = y(t) + J2 e ik ' ut z k {t) + V • 0(t 2 h N ) ( 9 . 13 ) 

keAf* 

with u = \/e, uniformly for 0 < t = nh < T and s and h satisfying h/e > Co > 0. 
The modulation functions together with all their derivatives (up to some arbitrarily 
fixed order) are bounded by 

y 0 = 0(1), Vj = 0(e 2 ) 

zf {j) = 0(e), zf {j) = 0(e 2 ) ( 9 . 14 ) 

z k = O(hs^) for k^±(j) 

for j = 1,..., L Here, (j) = (0,..., 1,..., 0) is the jth unit vector. The last es¬ 
timate holds also for zj-j for all k G Af*. Moreover, the function y is real-valued 
and z~ k = z k for all k G Af*. The constants symbolized by the O-notation are 
independent of h, £ and A j with (9.10), but they depend on E, N, c, and T. 
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The proof extends that of Theorem XIII.5.2. In terms of the difference operator 
of the method, L(hD ) = e hD — 2 cos hQ + e~ hD , the functions y(t) and z k (t ) are 
constructed such that, up to terms of size • 0{h N+ 2 ), 

L{hD)y = ti 1 *(g($y)+ ]T ^9 (m \*v){$z) a ) 

s(a)~ 0 

L(hD + ihk-u)z k = h 2 & V 2_ g ( m \$y)($z) a . 

ml 

s(a)~k 

Here, the sums on the right-hand side are over all m > 1 and over multi-indices 
a = (oq 5 ..., a m ) with a^- G A/"*, for which the sum s(a) = a j satisfies 

the relation s(a) ~ k, that is, s(a) — k G M. The notation (<Pz) a is short for the 
ra-tuple (<Pz ai ,..., <&z arn ). 

A similar expansion to that for x n exists also for the velocity approximation x n , 
like in Theorem XIII.5.3. As a consequence, the oscillatory energy (9.4) along the 
numerical solution takes the form, at t = nh < T, 

Ij(xn,x n ) = 2 co 2 \\zf(t)\\ 2 + 0(e). (9.15) 

With the first terms of the modulated Fourier expansion one proves, as in Theo¬ 
rems XIII.4.1 and XIII.4.2, error bounds over bounded time intervals which are of 
second order in the positions and of first order in the velocities: 

\\x n - x(t n )W < Ch 2 , \\x n - x(t n )\\ < Ch , (9.16) 

where C is independent of 5 , h and n with nh < T and of bounds of solution 
derivatives. 

XIII.9.3 Almost-Invariants of the Modulation System 

With y°(t) = z°(t) = y(t ) and y k (t) = e lk ' ut z k (t) for k G A f, where y and z k are 

the modulation functions of Theorem 9.2, we denote 

y = (y k )keAf, z = (z k )kess- 

We introduce the extended potential 

U(y) = U(<Py°) + V (9.17) 

ml 

s(a)~ 0 

where the sum is again taken over all to > 1 and all multi-indices a = (oi ,..., a m ) 
with aj £ Af* for which s(a) = J2j a j S M. . The functions y k (t) then satisfy 

&- 1 $h- 2 L(hD)y k = - V_ fc W(y) + «P • <D(h N ), 


( 9 . 18 ) 
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where V_/c denotes the gradient with respect to the variable y~ k . This system has 
almost-invariants that are related to the Hamiltonian H and the oscillatory energies 
I n with y _L A4. 

The Energy-Type Almost-Invariant of the Modulation System. We multiply 

(9.18) by ( y~ k ) T and sum over k E AT to obtain 

£ (y- k ) T *- 1 $h- 2 L(hD)y k + |w(y)i = 0(h N ). 

keAf 

Since we know bounds of the modulation functions z k and of their derivatives from 
Theorem 9.2, we rewrite this relation in terms of the quantities z k : 

£ ( z~ k - ik • LOZ~ k ) T \H r ~ 1 ^>h~ 2 L(hD + ihk ■ co)z k + ^ U{ z) = 0{h N ). 
keAf 

(9.19) 

As in (6.21) we obtain that the left-hand side of (9.19) can be written as the time 
derivative of a function H * [z] (t) which depends on the values at t of the modulation- 
function vector z and its first N time derivatives. The relation (9.19) thus becomes 

j t n*[m = o(h N ). 

Together with the estimates of Theorem 9.2 this construction of H* yields the fol¬ 
lowing multi-frequency extension of Lemma XIII.6.4. 

Lemma 9.3. Under the assumptions of Theorem 9.2 , the modulation functions z = 
(. z k )keAf of the numerical solution satisfy 

n*[x](t) = n*[ z](0) + 0(th N ) (9.20) 

for 0 <t<T. Moreover, att = nh we have 

H*[z\(t) = H*(x n ,x n ) + 0(h ), (9.21) 

where, with <r(£) = sine(0^(0/V ; (0 an d £,j = hXj/e, 

i 

H*(x,x) = H(x,x) + £(afe) - 1) Ij(x,x). (9.22) 

3=% 

The Momentum-Type Almost-invariants of the Modulation System. The equa¬ 
tions (9.18) have further almost-invariants that result from invariance properties of 
the extended potential U , similarly as the conservation of angular momentum results 
from an invariance of the potential U in a mechanical system by Noether’s theorem. 

For /iGl^ and y = ( y k )keAf we set 

5„(r) y = (e ik ^y k ) keN , 


T G K 
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so that, by the multi-linearity of the derivative, the definition (9.17) yields 

p is(a)-/AT 

W(5„(r)y) = U(0y°) + £ U( m \$y°)($y) a . (9.23) 

s(a)~0 


If fj, ± M., then the relation s(a) ~ 0 implies 8 (a) -/x = 0, and hence the expression 
(9.23) is independent of r. It therefore follows that 



W(S M (r)y) - Ti(k- fi) {y k ) T V k U{ y) 
T=0 keM 


for all vectors y = ( y k )keAf • If h I s not orthogonal to Ad, some terms in the sum of 
(9.23) depend on r. However, for these terms with s(a) £ Ad and 8 (a) • p ^ 0 we 
have |s(a)| > M = min{|/c| : 0 7 Ad} and if fi _L Mn, then |s(a)| > 7V + 1. 
The bounds (5.13) then yield 


X i(k • M) (2/ fc ) T V fe W(y) 

keAT 


0(e M ) for arbitrary/i 
( 9 ( 5 iV+1 ) for/xTAdjv 


(9.24) 


for the vector y = y (t) as given by Theorem 9.2. Multiplying the relation (9.18) by 
|(—k • fi) ( y~ k ) T and summing over k £ Af, we obtain with (9.24) that 


- ' yjA-- n){y~ k ) T 'I'- 1 $ h~ 2 L(hD)y k = 0(h N ) + C(e M_1 ). 

£ fee AT 

The 0{e M ~ 1 ) term is not present for p _L Ad at. Written in the 2 : variables, this 
becomes 


- - X ( fc • n)( z ~ k ) T 'P~ 1 ® h~ 2 L(hD + ihk ■ u)z k = 0(h N ) + 0(£ M_1 ). 
£ fee M 

(9.25) 

As in (9.19), the left-hand expression turns out to be the time derivative of a function 
X* [z] (t) which depends on the values at t of the function z and its first N derivatives: 

= o(h N ) + o(e M ~ i ). 

Together with Theorem 9.2 this yields the following. 

Lemma 9.4. Under the assumptions of Theorem 9.2, the modulation functions z 
satisfy 

x;[z](t) = 2*[z](0) + 0(th N ) + 0(t£ M - 1 ) (9.26) 

for all i -1 € E /: and for 0 < t < T. They satisfy 

i;m) = z;im + o(th N ) 


(9.27) 
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for p _L M.n and 0 <t<T. Moreover, att = nh, 

i;[z](t) = I*(x n ,x n ) + 0(e), 
where, again with a(£) = sinc(£)0(£)/?/>(£), 


r^ x ,x) = 


3 =1 


Mj 

A? 


Ij(x,x). 


(9.28) 


(9.29) 


XIII.9.4 Long-Time Near-Conservation of Total and Oscillatory 
Energies 


With the proof of Theorem XHI.7.1, the above two lemmas yield the following 
results from Cohen, Hairer & Lubich (2004). 


Theorem 9.5. Under conditions (9.9)-(9.11), the numerical solution obtained by 
method (2.6) with (2.11) satisfies, for H * and I* defined by (9.22) and (9.29), 


H*(x n ,x n ) 

I/j,( x rn x n) 


H*(x 0 ,x 0 ) + O(h) 
/*(xq, xq) + 0(h) 


for 0 < nh < h N+1 


for i± G with p = {k £ M. : \k\ < N}. The constants symbolized by O 

are independent of n, h, e, A j satisfying the above conditions, but depend on N and 
the constants in the conditions. 


Since p = A is always orthogonal to M and to Ad at, the relation 
K(x, x) = H*(x, x) — I\(x, x) 
for the smooth energy (9.7) implies 

K(x n , x n ) = K(xo, %o) + 0(h) for 0 < nh < h~ N+1 . (9.30) 

For cr(£) = 1 (or equivalently f>(() = sine(£)</>(£)) the modified energies H* and 
/* are identical to the original energies H and I^ of (9.2) and (9.5). The condi¬ 
tion = sinc(£)0(£) is known to be equivalent to the symplecticity of the one- 
step method ( x n ,x n ) i—> (x n+ i,x n+ i), but its appearance in the above theorem 
is caused by a different mechanism which is not in any obvious way related to 
symplecticity. Without this condition we still have the following result, which also 
considers the long-time near-conservation of the individual oscillatory energies Ij 
for j = 1 ,..., L 

Theorem 9.6. Under conditions (9.9)-(9.11), the numerical solution obtained by 
method (2.6) with (2.11) satisfies 


H(x n ,x n ) = H(x 0 ,xo) + OQi) 
Ij(x n ,x n ) = Ij(x 0 ,x 0 ) + 0(h) 


for 0 < nh < h • min(e M+1 , h N ) 
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for j = 1,... with M = min{|/c| : 0 7 - k £ M.}. The constants symbolized by 
O are independent of n, h, e, A j satisfying the above conditions , but depend on N 
and the constants in the conditions. 

For the non-resonant case M. = {0} we have M = 00 and hence the length of 
the interval with energy conservation is only restricted by (9.10). Notice that always 
M > 3, and M = 3 only in the case of a 1:2 resonance among the A j. For a 1:3 
resonance we have Mm 4 and in all other cases M > 5. 

Explanation of the Numerical Experiment of Sect. XIII.9.1. All numerical meth¬ 
ods in Figs. 9.2-9.4 satisfy the conditions of Theorems 9.6 and 9.5 for the step sizes 
considered. 

In Fig. 9.2 we have the (symplectic) method (2.6) with 0(£) = 1 and f>(^) = 
sinc(£), which has cr(£) = 1, so that H and H*, and and /* coincide. For all 
step sizes, the oscillatory energy I 2 corresponding to the non-resonant frequency 
\f2uj and the sum Ii + J 3 are well conserved on long time intervals, in accordance 
with Theorem 9.5. The individual energies I\ and Is corresponding to the resonant 
frequencies u = 1/e and 2/e are not preserved on the time scale considered here, 
cf. Fig. 9.1. In fact, Theorem 9.6 here yields only a time scale 0{he~ 2 ). 

In Fig. 9.3 we use the method with </>(£) = 1 and = sinc 2 (£/2), for which 
<t(£) is not identical to 1, and hence H and H*, and I^ and /* do not coincide. 
The oscillatory energy I 2 = cr^T 1 /* with p = (0,1,0) _L M, which corresponds 
to the non-resonant frequency s/2u, is approximately conserved over long times. 
Since Theorem 9.5 only states that the modified energies are well preserved, it is 
not surprising that neither Ii + J 3 nor the original total energy H (not shown in the 
figure) are conserved. The modified energies H* and o\I\ + ( 73/3 (not shown) are 
indeed well conserved, and so is the smooth energy K , in agreement with (9.30). 

Figure 9.4 shows the result for the (symplectic) method with 0(£) = sinc(£) and 
^(£) = sine(0^(0- Since cr(£) = 1 , the oscillatory energy I 2 for and also 
Ii + Is are well conserved, in agreement with Theorem 9.5. However, the energy 
exchange between the resonant frequencies is not correctly reproduced. This behav¬ 
iour is not explained by Theorems 9.5 and 9.6, but it corresponds to the analysis 
in Sect. XIII.4.2 which, for the single-frequency case, explains the incorrect energy 
exchange of methods that do not satisfy ^( 0^(0 = si nc (0 ( an d thus, of all sym¬ 
plectic methods (2.7)-(2.10), with the exception of the above method with 0(£) = 1 
and ?/>(£) = sinc(£)). That analysis could be extended to the multi-frequency case 
considered here. 

We remark that the techniques of Sects. XIII.9.2 and XIII.9.3 can also be used 
to study the energy error of the Stormer-Verlet method, as in Sect. XIII. 8 ; see The¬ 
orem 5.1 in Cohen, Hairer & Lubich (2004). The modulated Fourier expansion of 
the exact solution yields results on the near-preservation of the oscillatory energies 
along a bounded exact solution: under the energy bound (9.9) and the non-resonance 
condition 

\k • A| > cyfs for k £l/\M with \k\ < N 
we have (see Theorem 6.1 in Cohen, Hairer & Lubich 2004) 


(9.31) 
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Ip{x{t),x{t)) = /^(^(O),x(0)) + O(e) for 0 < t < £ _Ar+1 (9.32) 

for fi with ji _L Mn = {k £ M : |fc| < N}. We further have 

Ij(x(t),x{t)) = /j(x(O),x(O))+ 0 (£) for 0 < t < £-min(£ _M+1 ,£ _Ar ) 

(9.33) 

for j = 1 ,..., i 9 with M = min{|&| : 0 7 ^ k G Ad}. 


XIII.10 Systems with Non-Constant Mass Matrix 

The high frequencies of the linearized differential equation remain constant up to 
small deviations for mechanical systems with a Hamiltonian of the form 

H(p,q) = ^PoM 0 (q)~ 1 p 0 + + ^p T R(q)p + ^q[ A-^qt + U{q) 

( 10 . 1 ) 

with a symmetric positive definite matrix Mo(q ), constant symmetric positive defi¬ 
nite matrices M\ and A 1 , a symmetric matrix R(q) with 

R(qo, 0 ) = 0 , 

and a potential U(q). All the functions are assumed to depend smoothly on q. 
Bounded energy then requires q\ = O(e), so that p T R(q)p = O(e), but the deriva¬ 
tive of this term with respect to q\ is 0(1). 

As in (9.1), we may assume, after an appropriate canonical linear transformation 
based on a Cholesky decomposition of the mass matrix and a diagonalization of the 
resulting stiffness matrix, that the Hamiltonian is of the form 

Hip,a) = lpo M o(q)~ 1 Po + \^2(\\PjW 2 + ^ INI 2 ) + \p t R( q)p + u(q) 

3 = 1 

( 10 . 2 ) 

with distinct, constant Xj > 1 . 

The necessity for such a generalization results from the fact that oscillatory me¬ 
chanical systems with near-constant frequencies in 2 or 3 space dimensions typically 
cannot be put in the form (9.1), but in the more general form (10.1) or (10.2). 

Example 10.1 (Stiff Spring Pendulum). The motion of a mass point (of mass 1) 
hanging on a massless stiff spring (with spring constant 1/e 2 ) is described in polar 
coordinates x\ = r sirup, x 2 = — rcosp by the Lagrangian with kinetic energy 

T = -(£ 2 + X 2 ) = 2 (T 2 + r 2 0 2 ) and potential energy U = ^2 (r — l ) 2 — r cos p. 
With the coordinates qo = p, qi = r — 1 and the conjugate momenta pi = dT/dqi 
this gives the Hamiltonian 

H(p,q) = 1((1 +qi)~ 2 Po +P 2 i) + - (1 +<?i)cos<3' 0 , 

which is of the form ( 10 . 2 ). 
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Numerical methods for systems (10.2) are studied by Cohen (2005). He splits 
the small term \p T R(q)p from the principal terms of the Hamiltonian and proposes 
the following method, where 

K(p 0 ,q) = ±p%M 0 {q)- 1 po + U{q). 

Algorithm 10.2. 1. A half-step with the symplectic Euler method applied to the 

system with Hamiltonian \p T R(q)p gives 


P n = p n ~^V q (±(p n ) T R(q n )p n ) 

q" = q n + ^R(q n )p n . 


(10.3) 


2. Treating the oscillatory components of the variables p and q with a trigonomet¬ 
ric method (2.7)-(2.8) and the slow components with the Stormer-Verlet scheme 
yields (for j = 1,..., i and with u)j = A j/e and = hujj ) 

pT 1/2 = ps-^ qo K( P y l/ 2 ,$r) 

oS +1 = + \ (y P 0 K{pT l/ \$r) + s/ P 0 k( p ; +1/2 , $r +1 )) 

<% +1 = cos(&)g? + cop sinfe)p” - V qj K(p n 0 +1/2 , <P$") 

P] +1 = ~Uj sin + cos(Cj)pj - | (V’o^jOV^ A'(po + 1 / 2 ,^5”) 

+ Mtj)v qi K(Po H/2 >*q n+1 )), 

pS + 1 = P n A 1/2 -^ qo K(p n 0 +1, \$r +1 ) (io.4) 


where <T> = f(hQ) with Q = diag (uijl). 

3. A half-step with the adjoint symplectic Euler method applied to the system with 
Hamiltonian \p T R(q)p gives 


p n+ 1 = p” +1 - \ ^ q (f{p n+1 ) T R{q nU )p n+1 ') 


ryn+1 


= 5 n+1 + ^i?(g n+ 1 )p n+1 . 


(10.5) 


The filter functions are again real-valued functions with 0(0) = 

0(0) = -0(0) = 0(0) = 1 that satisfy (2.9). The method is still symplectic if 
and only if (2.10) holds. Note that Step 2. of the algorithm is explicit if Mo(q) does 
not depend on q 0 . 

Cohen (2004, 2005) studies the modulated Fourier expansion of this method and 
shows that the long-time near-conservation of total and oscillatory energies as given 
by Theorem 9.6 remains valid also in this more general situation. 
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Example 10.3 (Triatomic Molecule). The motion of a near-rigid triatomic mole¬ 
cule is described by a Hamiltonian system with a Hamiltonian (10.2). For simplicity 
we fix the position of the central atom. We then have two stiff-spring pendulums 
strongly coupled by another spring. With angles and distances as shown in Fig. 10.1, 
we use the position coordinates (pi,qi = r\ — 1, (^ 2 , #2 = ^2 — 1 with the conjugate 
momenta 7Ti, pi , 7T2, P 2 , respectively. The Hamiltonian then reads 

H(n,p,ip,q) = ^((l + gi) _2 7ri+Pi + (l + g 2 ) _2 7r|+P2) 

1 / O? \ 

+ + #2 + “^“(^2 - ^i) 2 J + U(ip,q) (10.6) 

with a spring constant /e 2 for connecting the two pendulums and an external 
potential U. With the canonical change of variables 


\ = J_ (-1 l\ f (fi\ 
Qo) y/2 V 1 

the Hamiltonian takes the form (10.2): 



1 




H(p,q) = l(Po +Pi +P 2 +P 3 ) + 2^2(91 +<?2 +a 2 qi) 
+ p T R(q)p + U{q) 


(10.7) 


with 


p T R{q)p 


1 2q 2 + ^2 
4 (1 + q 2 ) 2 


(P 0 -P 3) 2 - 


1 2 ^i + 

4 (1 + <?i) 2 


(Po +P3) 2 


and U(q) = U(ip 1 ,ip 2 ,qi,q 2 )- 

For the water molecule the ratio between the frequencies of the bond angle and 
the bond lengths is a « 0.2, according to some popular models. In our numerical 
experiments, we observed good conservation of all the oscillatory energies and the 
total energy. More interesting phenomena occur in a near-resonance situation. We 
consider a = 0.49 and e = 0.01, no exterior potential (U = 0), and initial values 
g(0) = (0, 5 / 2 , 5 , a/s) and p(0) = (1.1, 0.2, —0.8,1.3). In Fig. 10.2 we apply 
the method of Algorithm 10.2 with step sizes h = 0.5 6 and h = 2s and obtain 



Fig. 10.1. Water molecule and reference configuration as gray shadow 
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Fig. 10.2. Oscillatory energies and total energy for the method of Algorithm 10.2 



Fig. 10.3. Oscillatory energies and total energy for the Stormer-Verlet method 


numerical results that agree very well with a solution obtained with very small step 
sizes. For comparison we show in Fig. 10.3 the results of the Stormer-Verlet method 
with step sizes h = 0.2 e and h = 0.5 s, for which the energy exchange is not 
correct. For the reason explained in Sect. VI.3, (3.2)-(3.3), both methods are fully 
explicit for this problem. 


XIII. 11 Exercises 

1. Show that the impulse method (with exact solution of the fast system) reduces 
to Deuflhard’s method in the case of a quadratic potential W ( q ) = \q r Aq. 

2. Show that a method (2.7)-(2.8) satisfying (2.9) is symplectic if and only if 

?/>(£) = sinc(£) 0(£) for £ = hu:. 

3. The change of coordinates x n = x(/il?)^ n transforms (2.7)-(2.8) into a 
method of identical form with </>,'0,'0o,'0i replaced by X0> X -1 ^? X _1, 0o> 
X _1 Vt- Prove that, for hu; satisfying smc(hcj)(j)(hLj)/ / ip(hLj) > 0, it is pos¬ 
sible to find x(^) suc h that the transformed method is symplectic. 

4. Prove that for infinitely differentiable functions g(t) the solution of x + c o 2 x = 
g(t ) can be written as 

x(t) = y(t) + cos (jut) u(t ) + sin (cot) v(t), 
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where y(t), u(t ), v(t) are given by asymptotic expansions in powers of cj _1 . 
Hint. Use the variation-of-constants formula and apply repeated partial integra¬ 
tion. 

5. Show that the recurrence relation e n+ i — 2 cos (hf2) e n + e n _i = b n has the 
solution 

n 

e n +1 — —W n -i eo + W n e\ + W n -j bj 

j =i 

with W n = sin(/ii?) _1 sin ((n + (or the appropriate limit when 

sin (ft, 12) is not invertible). 

6. Consider a Hamiltonian H(p R ,pi, q Rl qi ) and let 

W(p, q) = 2H(p R ,p I ,q R ,q I ) 


for p = p# + zp/, q = q R + iqi. Prove that in the new variables p , q the 
Hamiltonian system becomes 


dH 

P=-gr(v,d), 


dH 

^ = ~dp ^ ' 


7. Prove the following refinement of Theorem 6.3: along the solution x(t) of (2.1), 
the modified oscillatory energy J(x, x) = I(x, x) — x\gi(x) satisfies 

J(x(t),x(t)) = J(x(0),x(0)) + 0(uj~ 2 ) + 0(tcj~ N ) . 


8. Define H(x, x) = i7(x, x) — px\gi(x), J(x, x) = J(x, x) — pxjgi(x) with 
J(x, x) of the previous exercise and with 


'ip(hcj) 
sine 2 {^huo) 


In the situation of Theorem 7.1, show that 

H(x n ,x n ) = H(x 0 ,x 0 ) + 0{h 2 ) 
J(x n ,x n ) = J(x 0 ,x 0 ) + 0(h 2 ) 


for 0 < nh < h Ar+1 . 


Notice that the total energy H(x n ,x n ) and the modified oscillatory energy 
J(x n ,x n ) are conserved up to 0(h 2 ) if p = 0, i.e., if -0(£) = sinc 2 (^£). This 
explains the excellent energy conservation of methods (A) and (D) in Figure 2.5 
away from resonances. 

9. Generalizing the analysis of Sect. XIII.8, study the energy behaviour of the im¬ 
pulse or averaged-force multiple time-stepping method of Sect. VIII.4 with a 
fixed number N of Stormer-Verlet substeps per step, when the method is ap¬ 
plied to the model problem with hu bounded away from zero. 



Chapter XIV. 

Oscillatory Differential Equations 
with Varying High Frequencies 


New aspects come into play when the high frequencies in an oscillatory system 
and their associated eigenspaces do not remain nearly constant, as in the previous 
chapter, but change with time or depend on the solution. We begin by studying 
linear differential equations with a time-dependent skew-hermitian matrix and then 
turn to nonlinear oscillatory mechanical systems with time- or solution-dependent 
frequencies. Our analysis uses canonical coordinate transforms that separate slow 
and fast motions and relate the fast oscillations to the skew-hermitian linear case. For 
the numerical treatment we consider suitably constructed long-time-step methods 
(“adiabatic integrators”) and multiple time-stepping methods. 


XIV. 1 Linear Systems with Time-Dependent 
Skew-Hermitian Matrix 

We consider first-order linear differential equations with a skew-hermitian matrix 
that changes slowly compared to the rapid oscillations in the solution, a problem 
that has attracted much attention in quantum mechanics. We present a suitable class 
of numerical methods, termed adiabatic integrators, which can take time steps that 
are substantially larger than the almost-periods of the oscillations. 

XIV. 1.1 Adiabatic Transformation and Adiabatic Invariants 

It comes from the greek adiaflaTLxos, “which cannot be crossed”. 

... we arrive by analogy to the “adiabatic principle” used in Quantum 
and then Classical Mechanics. It is based upon the fact that the harmonic 
oscillator (and other simple dynamical systems as it was found later) sub¬ 
mitted to slow variations of its parameters modifies its energy but keeps 
its action (energy divided by frequency) constant. 

As we can see, the path from the word “adiabatic” used in thermodynam¬ 
ics to the above “adiabatic principle” is tortuous and our greek colleagues 
are certainly puzzled by sentences such as “the changes in the adiabatic 
invariant due to [...] crossing” which we shall use later. 

(J. Henrard 1993) 


We consider the linear differential equation 
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y(t) = J Z(t)y(t ), (1.1) 

where Z(t) is a real skew-symmetric (or complex skew-hermitian) matrix-valued 
function with time derivatives bounded independently of the small parameter e. 
In quantum dynamics such equations arise with Z(t) = — iH(t ), where the real 
symmetric (or hermitian) matrix H (t) represents the quantum Hamiltonian opera¬ 
tor in a discrete-level Schrodinger equation. We will also encounter real equations 
of this type in the treatment of oscillatory classical mechanical systems with time- 
dependent frequencies. Solutions oscillate with almost-periods ^ 5 , while the sys¬ 
tem matrix changes on a slower time scale ~ 1 . 

Transforming the Problem. We begin by looking for a time-dependent linear 
transformation 

v(t) = T s (t)y(t), (1.2) 

taking the system to the form 

fi(t) = S e (t)ri(t ) with S e =f e T~ 1 + jT £ ZT-\ (1.3) 

which is chosen such that S £ (t) is of smaller norm than the matrix \Z(t) of (1.1). 

Remark 1.1. A first idea is to freeze Z(t) ~ Z* over a time step and to choose the 
transformation 

T e (t) = expyielding S £ (t) = ^ exp^-^Z*^ (Z(f) - Z*) exp^Z*^. 

This matrix function S £ (t) is highly oscillatory and bounded in norm by 0(h/e ) 
for 1 1 — t 0 1 < h, if Z* = Z(to + h/2). Numerical integrators based on this trans¬ 
formation are given by Lawson (1967) and more recently by Hochbruck & Lubich 
(1999b), Iserles (2002, 2004), and Degani & Schiff (2003). Reasonable accuracy 
still requires step sizes h = 0(e) in general; see also Exercise 3. In the above pa¬ 
pers this transfomation has, however, been put to good use in situations where the 
time derivatives of the matrix in the differential equation have much smaller norm 
than the matrix itself. 

Adiabatic Transformation. In order to obtain a differential equation (1.3) with a 
uniformly bounded matrix S £ (t) we diagonalize 

z(t) = u(t)iA(t)u(ty 

with a real diagonal matrix A(t) = diag(Aj(t)) and a unitary matrix U(t) = 
..., u n (t)) of eigenvectors depending smoothly on t (possibly except where 
eigenvalues cross). We define 77 (f) by the unitary adiabatic transformation 

v(t) = exp^-^$(t)^U(t)*y(t) with ${t) = diag (<t>j{t)) = J A(s)ds, 

(1-4) 
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which represents the solution in a rotating frame of eigenvectors. Each component 
of rj(t) is a coefficient in the eigenbasis representation of y(t) rotated in the com¬ 
plex plane by the negative phase. Such transformations have been in use in quan¬ 
tum mechanics since the work of Born & Fock (1928) on adiabatic invariants in 
Schrodinger equations, as discussed in the next paragraph. The transformation (1.4) 
yields a differential equation where the ^-independent skew-hermitian matrix 

W(t) = U(t)*U(t ) 


is framed by oscillatory diagonal matrices: 


f]{t) = exp^—-<£(£)^ W(t) exp^-^(f)^ (1.5) 

Numerical integrators for (1.1) based on the transformation to the differential equa¬ 
tion (1.5) with bounded, though highly oscillatory right-hand side, are given by 
Jahnke & Lubich (2003) and Jahnke (2004a); see Sect. XIV. 1.2. 

Adiabatic Invariants. Possibly after a time-dependent rephasing of the eigenvec¬ 
tors, u k (t) —► e iak ^Uk(t), we can assume that u k {f) is orthogonal to u k (t) for 
all t. (This is automatically satisfied if U(t) is a real orthogonal matrix, as is the 
case for Z(t) = —iH(t) with a real symmetric matrix H(t).) We then have the 
matrix W = ( Wj k ) = ( u*u k ) with zero diagonal. 

After integration of both sides of the differential equation (1.5) from 0 to t , 
partial integration of the terms on the right-hand side yields for j ^ k (terms for 
j = k do not appear since Wjj = 0) 


J exp (—^(<fo(s) - <fa(s))) Wjk(s) r}k(s) ds 

= is exp(-^(0j(s) - <£fc(s))) 


Wjkjsynkjs) 

\j(s) - A k (s) 


( 1 . 6 ) 


~ ie Jo exp ( - e^ j ^ ~ 


d Wjk(s)r]k(s) 
ds Xj(s) - Afc(s) 


ds . 


At this point, suppose that the eigenvalues A j (t) are, for all t , separated from each 
other by a positive distance S independent of e: 


- \ k {t)\ >5 for all j^k. (1.7) 

Then the reciprocals of their differences and the coupling matrix W(t ) are bounded 
independently of 5, as are their derivatives. Together with the boundedness of rj as 
implied by (1.5), this shows 


rj(t) = 77 ( 0 ) + 0(e) for t < Const. (1.8) 


This result is a version of the quantum-adiabatic theorem of Born & Fock (1928) 
which states that the actions \rjj \ 2 (the energy in the j th state, (rjjUj, HrjjUj) = 
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A j \r]j\ 2 , divided by the frequency A j) remain approximately constant for times t = 
0(1). Such functions I(y,t) that satisfy I(y(t) : t ) = l(y(0),0) + O(e) for t = 
0(1) along every 0(l)-bounded solution y(t) of the differential equation, are called 
adiabatic invariants. 

Super-Adiabatic Transformations. Adiabatic invariants are obtained over longer 
time scales by refining the transformation; see Lenard (1959) and Garrido (1964). 
Here we show that the transformation matrix T £ of (1.2) can be constructed such 
that the matrix S £ in the transformed differential equation (1.3) is of size 0(e N ). 
Let us make the ansatz of a unitary transformation matrix 

= exp^— exp(— i@i) exp(£Xi) ... exp(— ie N ~ x ^N) exp (e N Xn) U* 

with real diagonal matrices @ n (t) and complex skew-hermitian matrices X n (t). We 
find that S £ = | T £ ZJ'* + T £ T* is 0(e) if and only if X 1 and := satisfy 

- ^exp(6Xi) iA exp(— eXi) — iA^j — iA\ + W = 0(e), 

or equivalently, if X 1 and A 1 solve the commutator equation 

[iA,X i] = W. 

This is solved by setting %A\ equal to the diagonal of W and determining the off- 
diagonal entries of X 1 from the scalar equations 

^(A j Xk) ^jk = 'tVjki J 7^ ^5 

which can be done as long as the eigenvalues are separated. The diagonal of X\ is 
set to zero. Since W is skew-hermitian, so is X\. Similarly we obtain for higher 
powers of e the equations 

[iyl, X n ] H - iA n = lV n —\, 

where the matrix W n -\ contains only previously constructed terms up to index n —1 
and derivatives up to order n and is skew-hermitian because S £ is skew-hermitian. 
In this way we obtain a unitary transformation such that 

r]( N \t) = T^ N \t) y(t) satisfies 77 ^ = 0(e N ). 

We remark that the above construction of T £ N ^ is analogous to transformations in 
Hamiltonian perturbation theory; cf. Sect. X.2. 

The differential equation (1.1) thus has adiabatic invariants over times 0(e~ N ) 
for arbitrary N > 1, and in fact even over exponentially long time intervals 
t = 0(e c ' E ) if the functions have a bounded analytic extension to a complex strip, 
as is shown by Joye & Pfister (1993) and Nenciu (1993). The leading term in the ex¬ 
ponentially small deviation of \r]j N \t)\ 2 in the optimally truncated super-adiabatic 
basis has been rigorously made explicit by Betz & Teufel (2005a, 2005b), proving 
a conjecture by Berry (1990). 
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Avoided Crossing of Eigenvalues and Non-Adiabatic Transitions. To illustrate 
the effects of a violation of the separation condition (1.7), we consider the generic 
two-dimensional example studied by Zener (1932), with the matrix 

Z(«) = ->:(' _ 4 ( ). (1.9) 

which has the eigenvalues Jzis/t 2 + S 2 . The minimal distance of the eigenvalues is 
25 at t = 0. For 5 = 0(y/e) the adiabatic invariance (1.8) is no longer valid, and 
rj can undergo 0(1) changes in an O (5) neighbourhood of t = 0: a non-adiabatic 
transition in physical terminology. The changes in the adiabatic invariant due to the 
avoided crossing of eigenvalues are illustrated in Fig. 1.1 and can be explained as 
follows. 



Fig. 1.1. Non-adiabatic transition: |? 7 i(t)| 2 and |? 72 (t )| 2 as function of £ for e = 0.01 and 
5 = 2 -1 ,2 -3 ,2 -5 ,2 -7 (increasing darkness) 


Near the avoided crossing, a new time scale r = t/S is appropriate. The decom¬ 
position Z(t) = U ( t)iA(t)U ( t) T of the matrix yields 


U(t) = U(t) 
A(t)/S = A(t) 


( cos a(r) — sum(r)\ 
ysnm(T) cos a(r) J ’ 


f —y/ r 2 + 1 0 \ 

v o J ’ 


with a(r) 
$(t) 
W (r) 


f — \ arctan(r). We introduce the rescaled matrices 


/' 


A(a) da = <P(t)/5 2 , 


= ipv T ) ; "o 1 


= 5-W(t). 


Note that the entries of W(t) have a sharp peak of height (2 8) 1 at t = 0. The 
rescaled function rj(r) = rj(t) is a solution of the differential equation 


d^_ 

dr 


rj(r) = exp 



W (r)exp 
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For S 2 < £ and |r| = \t/S\ < 1, the matrix on the right-hand side is bounded of 
norm ~ 1 and has bounded derivatives with respect to r. The function rj(r) therefore 
changes its value by an amount of size 0(1) in the interval |r| < 1. We also note 
that any numerical integrator using piecewise polynomial approximations of W (t ) 
and hence of W(r) must take step sizes At = h/S 1, i.e., h <C S. On the 
other hand, the rescaling shows that the number of time steps needed to resolve the 
non-adiabatic transition up to a specified accuracy is independent of S. 

XIV. 1.2 Adiabatic Integrators 

We discuss symmetric long-time-step integrators for the rotating-frame differential 
equation (1.5) that describes skew-hermitian systems in adiabatic variables. The 
construction follows Jahnke & Lubich (2003) and Jahnke (2004a); see also Lorenz, 
Jahnke & Lubich (2005). 

First-Order Integrators. We consider the differential equation (1.5) and integrate 
both sides from t n to £ n+ i = t n + h: 

'/('»«) '/(/»)-• / exp(-^#(s)) W(s) exp(^#(s)) rj( s )ds, (1.10) 

J tn 

where W ( t ) is an ^-independent matrix, continuously differentiable in £, and the 
real diagonal matrix of phases $(t) is given as the integral of A(t) = diag (A j(t)). 
In the applications, W ( t ) and $(t) are not given explicitly, but need to be computed 
using numerical differentiation and integration, respectively. For simplicity, we here 
ignore this approximation and consider W, A as given time-dependent functions. 

Since r] and W have bounded derivatives, the following averaged version of the 
implicit midpoint rule has a local error of G(h 2 ) uniformly in s: 1 

Vn+1 =Vn + J exp(-^(s)) W(t n+ 1/2) exp(^(s)) ds r] n+1 +jy n ). 

( 1 . 11 ) 

The problem then remains to compute the oscillatory integral. The integrand can be 
rewritten as 

E(*(s)) • W(t n+1/2 ), 

where • denotes the entrywise product of matrices and 

E($) = (e jk ) with e jk = exp - <t>k))- 

With a linear phase approximation (of an error 0(h 2 )) 

^(^n+ 1/2 + Qh) ~ ^(^n+ 1 / 2 ) + 0hA(t n+1/2 ), 

1 Because of the oscillatory integrals, the local error is not 0(h 3 ) as might at first glance be 
expected for a symmetric method. 
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the integral is approximated by 

hE(<P(t n + 1 / 2 )) •^(£71+1/2) • 2) 

where X(t) is the matrix of integrated exponentials with entries (we omit the argu¬ 
ment t) 


L jk 


= J ^ exp(- ^ (Xj - A fe ))d6» = sine(^(Aj - A fe )). 


The error in the integral approximation comes solely from the linear phase approx¬ 
imation and is bounded by 0(h • = 0(h 2 ) if the Aj are separated, because 

then the integral is of size (9 ). We thus obtain the following averaged implicit 

midpoint rule with a local error of 0(h 2 ) uniformly in 


Vn+i = Vn + h(^E(^(t n+1/2 )) •I(t n+1/2 ) •W(t n+1/2 )'J ^{rj n+1 +r) n ). (1.12) 

An analogue of the explicit midpoint rule is similarly constructed, and from the 
Magnus series (IV.7.5) of the solution we obtain the following averaged exponential 
midpoint rule , again with an 0(h 2 ) local error uniformly in e: 


Vn+1 =* exp (hE($(t n+1/2 )) •I(t n+1/2 ) • W(t n+l/2 )) T] n . (1.13) 


For skew-hermitian W(t), also the matrix in (1.12) and (1.13) is skew-hermitian, 
and hence both of the above integrators preserve the Euclidean norm of p exactly. 
We summarize the local error bounds for these methods under conditions that in¬ 
clude the case of an avoided crossing of eigenvalues. 


Theorem 1.2 (Local Error). Suppose that for to < t < to + h and all j, k, 
\Xj(t)-X k (t)\>5, \Xj(t)\<C 0 , \\W(t)\\<f, \\W(t)\\ 

with 5 > 0. Then, the local error of methods (1.12) and (1.13) is bounded by 

II Vi ~v(to + h)\\ < C ^ \\r]o\\. 

The constant C is independent ofh,s,5. 


Proof The result is obtained with the arguments and approximation estimates given 
above, taking in addition account of the dependence on 5. □ 


The local error contains smooth, non-oscillatory components which accumulate 
to a global error p n — v{ln) = 0(h) on bounded intervals if the eigenvalues remain 
well separated. Using that in this case p is constant up to 0(e), this error bound 
can be improved to 0(mm{e,h}). The integrators thus do not resolve the 0(e) 
oscillations in p for large step sizes h > e, but like in Jahnke & Lubich (2003) 
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they can be combined with a (symmetric and scaling-invariant) adaptive step size 
strategy such that the methods follow the non-adiabatic transitions through avoided 
crossings of eigenvalues with small steps and take large steps elsewhere. 

We here consider applying an integrating reversible step size controller as in 
Sect. VIII.3.2 with the step size density function 

a(t) = (\\W(t)\\ 2 + a 2 )~ 1/2 


for a parameter a that can be interpreted as the ratio of the accuracy parameter 
and the maximum admissible step size. Choosing the Frobenius norm \\W\\ = 
(trace W T W) X I 2 , we then obtain the following version of Algorithm VIII.3.4, 
where p is the accuracy parameter and 


G(t) = = (\\W(t)\\ 2 + a 2 ) 1 trace (W(t) T W(t)). 


Set zq = l/cr(to) and, for n > 0, 


z n+1/2 

h n +i/2 

l"n -\-1 

r\n 

z n +1 


z n + 7 l ^G(t n ) 

b/ z n- 1-1/2 
tn + ^n+1/2 

T] n +i by (1.12) or (1.13) with step size h n +i/ 2 

z n+ 1/2 + ^G(t n+ i). 


(1.14) 


We remark that the schemes (1.12) and (1.13) can be modified such that they use 
evaluations at t n and £ n+ i instead of t n+1 / 2 (Exercise 6). 

Applying the above algorithm with accuracy parameter p = 0.01 and a = 0.1 
to the problem of Fig. 1.1 with 5 = 0.01 and 6 = 2 -1 , 2~ 3 , 2 -5 , 2~ 7 yields the step 
size sequences shown in Fig. 1.2. In each case the error at the end-point t = 1 was 
between 0.5 • 10 -3 and 2 • 10 -3 . 



Fig. 1.2. Non-adiabatic transition: step sizes as function of £ for e = 0.01 and S = 
2 -1 ,2 -3 ,2 -5 ,2 -7 (increasing darkness) 


Second-Order Integrators. The 0(e) oscillations in p are resolved with step sizes 
up to h = 0{y/e) for methods that give 0(h 2 ) accuracy uniformly in 5. Such 
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methods require a quadratic phase approximation, and one needs further terms ob¬ 
tained from reinserting rj(s) under the integral in ( 1 . 10 ) once again by the same 
formula, thus yielding terms with iterated integrals (this procedure is known as the 
Neumann or Peano or Dyson expansion in different communities, cf. Iserles 2004), 
or by including the first commutator in the Magnus expansion (IV.7.5). Symmetric 
second-order methods of both types are constructed by Jahnke (2004a). 

Care must be taken in computing the arising oscillatory integrals. Iserles (2004) 
proposes and analyses Filon quadrature (after Filon, 1928), which is applicable 
when the moments, i.e., the integrals over products of oscillatory exponentials 
and polynomials, are known analytically. This is not the case, however, for all of 
the integrals appearing in the second-order methods. The alternative chosen by 
Jahnke (2004a) is to use an expansion technique based on partial integration. The 
idea can be illustrated on an integral such as 



with a 7 ^ 0. Partial integration that integrates the first factor and differentiates the 
second factor yields a boundary term and again an integral of the same type, but 
now with an additional factor (9(| • ^-) = 0(h). Using this technique repeatedly 
in the oscillatory integrals appearing in the second-order methods permits to ap¬ 
proximate all of them up to 0(h 3 ) as needed. We refer to Jahnke (2004a) for the 
precise formulation and error analysis of these second-order methods, which are 
complicated to formulate, but do not require substantially more computational work 
than the first-order methods described above, and just the same number of matrix 
evaluations. 

Higher-Order Integrators. Integrators of general order p > 1 are obtained with a 
phase approximation by polynomials of degree p and by including all terms of the 
Neumann or Magnus expansion for (1.5) with up to p-fold integrals. 


XIV.2 Mechanical Systems with Time-Dependent 
Frequencies 

We study oscillatory mechanical systems with explicitly time-dependent frequen¬ 
cies, where the time-dependent Hamiltonian is 

H(p,q,t) = 1 p r M(t)~ 1 p + N q T A(t)q + U(q, t) ( 2 . 1 ) 

with a positive definite mass matrix M(t) and a positive semi-definite stiffness ma¬ 
trix A(t) of constant rank whose derivatives are bounded independently of e. Such 
a Hamiltonian describes oscillations in a mechanical system that at the same time 
exerts a driven motion on a slower time scale. We consider motions of bounded 


energy: 
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H(p(t), q(t),t) < Const. (2.2) 

We transform (2.1) to a more amenable form by a series of linear time-dependent 
canonical coordinate transforms. The transformations turn the equations of motion 
into a form that approximately separates the time scales. This makes the problem 
more accessible to numerical discretization with large time steps and to the error 
analysis of multiple time-stepping methods applied directly to (2.1) in the originally 
given coordinates. 

XIV.2.1 Canonical Transformation to Adiabatic Variables 

By a series of canonical time-dependent linear transformations, which can all be 
done numerically with standard linear algebra routines, we now take the Hamil¬ 
tonian system (2.1) to a form from which adiabatic invariants can be read off and 
which will serve as the base camp for both the construction and error analysis of 
numerical methods. 

We introduce the energy E as the conjugate variable to time t and extend the 
Hamiltonian to 

H(p,E,q,t) = H(p,q,t) + E. (2.3) 

The canonical equations of motion are then (the gradient V refers only to q) 

P = -\A(t)q-VU(q,t) 

£ z 

q = M(t)~ 1 p 

along with E = —dH/dt and 1 = 1. 

Transforming the Mass Matrix into the Identity Matrix. We change variables 
such that the mass matrix M(t) in the kinetic energy part is replaced by the identity. 
With a smooth factorization 

M(t)~ l = C(t)C{t) T , (2.4) 

e.g., from a Cholesky decomposition of M(t), we transform to variables (q, t) by 

q = C(t)q, t = t. 

Then, the conjugate momenta are given by (see Example VI.5.2) 

p\ = (C Cq\ T fp\ = f C T p 
EJ V° 1 ) \ E ) \q r C T p + E 

With the transformed matrix A = C T AC, the Hamiltonian H(p, E,q,t) = 
H(p, E, q, t ) in the new variables then takes the form (we omit all tildes) 

H(p, E, q,t) = \ P T P + 2^ - q T C(t) T C(t)~ T p + U ( C(t)q, t) + E. 

(2.5) 
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Diagonalizing the Stiffness Matrix. We diagonalize the matrix A(t) in (2.5), 

A(t) = Q(t) ^ ) Q(t) T (2-6) 

with the diagonal matrix Q(t) = diag(u;j(t)) of frequencies and an orthogonal 
matrix Q(t), which depends smoothly on t if the frequencies remain separated. The 
matrix Q(t) can be obtained as the product 

o<‘>=«»(•>( i, q,° w )' <2 - 7 » 

where the transformation with Qo(t) takes A(t) to the block-diagonal form 

and Q*(t) diagonalizes A*(t). The effect of an avoided crossing of frequencies is 
localized to Q*(t), which then can have large derivatives, whereas those of Qo(t) 
remain moderately bounded. The transformation 

q = Q(t)q, t = t 

with the conjugate momenta 

p = Q(t) T p, E = q T Q(t) T p + E 

yields the Hamiltonian in the new variables (p, E,q,t) as (we omit all hats) 

H = \ pTp + 2? ^ ( 0 m 2 ) q + V TK VP + U{C(t)Q(t)q, t) + E (2.8) 
with 

K=(kZ Kn ) =QT( j- Q T C T C~ t Q. 

We decompose also 



according to the blocks in (2.6) and refer to qo and q\ (po and p i) as the slow and 
fast positions (slow and fast momenta), respectively. With the energy bound (2.2) 
we have 


P 1 = 0{ 1), qi=0(e). 


(2.9) 
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Rescaling Positions and Momenta. We transform 

Qo = qo, qi = t = t 


with the conjugate momenta 

Po=Po, Pi = £ 1/2 ft~ 1/2 pi, E = -^qle 1/2 ft~ z/2 ftpi + E. 

In the new variables, the Hamiltonian becomes (we omit the haceks on all variables) 

H = ^PoPo + ^Pi^(t)pi + ^qTfi(t)qi ( 2 . 10 ) 

+ q T K(t)p-+U(T(t)q,t) + E 

with 


K 

T 


K 00 e-^Koin V 2 

e 1 / 2 !?-! / 2 K 10 + \n~ x h 


( T ° I £ ' ,2t ‘) = (r“ St“)= c «(o 


Eliminating the Singular Block. We next remove the 0{e 1 / 2 ) off-diagonal block 
in k by the canonical transformation 

-Pi = ~Pi + e 1/2 12 _1/2 ii'(f 1 go, qo = %, t = t 
with the conjugate variables 

<h=qi, Po=Po + s 1/2 K 0 if2~ 1/2 qi, E = E + e 1/2 qo^ (K 0 if2~ 1/2 )q 1 . 
In these coordinates, the Hamiltonian takes the form (we omit all bars) 

H = ^PoPo + ^pjft(t)pi + ^qftt(t)qi ( 2 . 11 ) 

+ q T L(t)p+ | q T S(t)q + U(T(t)q,t ) + E 
with the lower block-triangular matrix 


L 


( Lqq 0 \ 

\^£: 1 / 2 I/io ^ii ) 

( K 00 0 

+ n-WKnti 1 / 2 + 


and the symmetric matrix 
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q _( Soo e^Sm \ 

\£ 1/2 S w eSu )' 

where 

500 = -KoiK^, 

501 = Sf 0 = -KooKq i ^“ 1/2 

- Koi/r 1 / 2 ^ 1 / 2 ^- 172 + \q- 1 ^2) - j t (Koift - 1/2 ), 

•S'n = n-^i-KwKoi - K&K& + K^Koi)n~ 1/2 . 

We note that with the energy bound (2.2) we now have 

Pi = 0(£ 1/2 ), qi = 0(e 1/2 )- (2.12) 


Equations of Motion. The differential equations now take the form 
Po = /(>(/'. <-/./) 

<io = po+go(q,t) (2.13) 

(pi\ = If 0 f Pi) f fi(p,q,t)\ 

\<ii J e \M(t) o ) \qi J + \ gi(q,t) ) 

with the functions bounded uniformly in e , 

=-L(t)p-S(t)q-T(t) T VU(T(t)q,t), = L{t) T q. 


The matrix in the system is diagonalized by a constant unitary matrix: with 


r il) 

we have 

( 0 -Q{t) \ f iQ{t) 0 \ 

\ < 7 (/i o ) vo —/r?(/) y 


(2.14) 


(2.15) 


Remark. Action-angle variables pij = cos Oj, qij = ^/a^sin^ for the har¬ 
monic oscillators would now put the Hamiltonian into the form H = \w(t) • a + 
G(a, po> (7o> £)> which could be studied further using averaging techniques, that is, 
using coordinate transforms that reduce the dependence on the angles in the Hamil¬ 
tonian; see Neishtadt (1984) for averaging out up to an exponentially small remain¬ 
der in the case of a single high frequency. The first-order averaging transform might 
be done numerically (cf. the formulas in Sect. XII.2), but the higher-order trans¬ 
forms involve increasingly higher derivatives of the functions involved and there¬ 
fore become impractical from the numerical viewpoint. For systems with several 
frequencies the averaging transforms require multi-dimensional integrals which are 
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expensive to compute. For our numerical purposes we therefore continue differently, 
adapting the adiabatic transformation of Sect. XIV. 1.1. 

The System in Adiabatic Variables. Let the diagonal phase matrix be given as 

m = j to A is) ds wi* A(t) = ( n ® _y t) y 

Our final transformation follows (1.4) and sets 

r7 = £- 1/2 exp T* yy'j. (2.16) 

The factor £ -1 / 2 is chosen for convenience so that (2.12) implies 

V = 0{ 1). (2.17) 



Po 

4o 

v 


fo(p,q,t) 

Po+go{q,t) 

e“ 1/2 exp (-^<2>(i)) -T* 


V 9i(q,t) ) 
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with pi,qi expressed in terms of p by (2.20). Written out, the differential equations 
for po , qo read 

Po = —LqoPo — Sooqo — Tq VU (T 0 go, t) — sSoiQip 
~Tq (\?U (T 0 g 0 + eTiQirj, t) - VU (T 0 g 0 , £)) 

go = Po + Lo 0 q 0 + sLJqQxT]. (2.21) 

The matrix multiplying p after substituting the expressions /i and g\ in the differ¬ 
ential equation for p becomes, apart from the oscillatory exponentials, 

w = r* ~ p 11 ~^ u ^ r ( 2 . 22 ) 

_l/Ln-Lf 1 Ln+LjA.isf-Sn S u \ 

2 y L\\ Lji Ln-LfJ 2 {-S n S n ) ’ 

which has a diagonal of size 0(e). The equation for p then reads 
p = exp (~ l -m) w(t) exp (} £ m) v 

— Pi (L\oPo + S'logo + Ti VE7(Togo + sTiQpq, t . (2.23) 

The matrix multiplying p is bounded independently of 6 , but highly oscillatory. Note 
that the coordinate transforms leading to (2.21), (2.23) are linear and can be carried 
out by standard numerical linear algebra routines. 

Adiabatic Invariants. We suppose that the eigenfrequencies ujj (t) remain separated 
and bounded away from 0 : there are S > 0 and c > 0 such that for any pair ujj(t ) 
and Wk(t) with j 7 ^ k (j, k = 1 ,..., m), the lower bounds 

| ujj(t) — ujk(t)\ > 6, ujj (t) > c (2.24) 

hold for all t under consideration. Under condition (2.24) the right-hand side r(t) 
in the differential equation for p consists only of oscillatory terms, up to 0(e). (No 
smooth terms larger than 0(e) arise because the matrix W has a diagonal of size 
0(e).) It then follows by partial integration that 

f r(s ) ds = O(e) for t < Const., (2.25) 

Jo 

and as in ( 1 . 6 ) we then obtain 

p(t) = 77 ( 0 ) + 0{e) for t < Const. (2.26) 

The functions defined by 

ii - I'/./1 2 = (2.27) 
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are thus adiabatic invariants: 


Ij(t) = Ij (0) + 0(e) for t < Const. (2.28) 

Starting from a Hamiltonian system (2.1), where the mass matrix equals the identity 
and the stiffness matrix is already diagonal, we find that Ij is the action (energy 
divided by frequency) 

which for a constant frequency ujj becomes a constant multiple of the oscillatory 
energy considered in Sect. XIII.9. 

The Slow Limit System. As 5 —> 0, the evolution of the slow variables po, qo is 
governed by the equations 

Po = -Loo(t)po - S 0 o(t)q 0 - T 0 (t) T \7U(T 0 (t)q 0 ,t) 

Qo = Po + L 0 o(t) T q 0 (2.29) 


which is the system with the time-dependent Hamiltonian 

H 0 (po,qo,t) = i PoPo + qoL 0 o(t)po + ^ S 0 o(t)qo + U(T 0 (t)q 0 ,t). 
We conclude this subsection with a simple illustration of the above procedure. 


Example 2.1 (Harmonic oscillator with slowly varying frequency). For the scalar 
second-order differential equation 


<? + 



= 0 , 


where u{t) is bounded away from 0 and has a derivative bounded independently 
of 5, the above transformations simplify considerably. The Hamiltonian in the orig¬ 
inal variables is already of the form 




and hence the first two transformations are not needed at all, and there are no slow 
variables po,qo- The rescaling transformation yields the Hamiltonian (2.10) in the 
form 


H = 


2e y 


Cj(t) 2 
2e q 


1 cj(£) 

2 uj(t) 


pq. 


With the adiabatic transformation (2.19) we thus represent the solution as 


;{t) 


q(t) + i 


u{t) 


q(t) = exp(- 


(j f co(s)dsV(t), 

^ J to 
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where £ = tt + ip solves the differential equation 

«* )= 4^ exp (-f 

and satisfies ((t) = ((to) (l + 0(e)) for t = 0(1). (In the above notation, we have 
rj = ^£ _1 / 2 (£, C) T -) The action 

is an adiabatic invariant. 


XIV.2.2 Adiabatic Integrators 

A simple long-time-step integrator for the oscillatory mechanical system with time- 
dependent Hamiltonian (2.1) now reads as follows: 

- Solve the slow limit system (2.29) for p 05 Qo > e.g., by the Stormer-Verlet method. 

- Keep the adiabatic variable rj constant at its initial value. 

Under the condition of bounded energy (2.1) and the frequency separation condition 
(2.24), the error in r] is then 0(s) over intervals t < Const, by (2.26). The difference 
between the solutions of (2.21) and the limit equation (2.29) is bounded by G(e 2 ) 
for t < Const., as can be shown by forming the difference of the equations, inte¬ 
grating, estimating the integral of the extra terms by 0(e 2 ) using (2.26) and partial 
integration, and applying the Gronwall inequality. In the original variables p,q of 
(2.1) this yields an error G(e 2 ) in the positions and 0(e) in the momenta. 

More refined integrators are needed for two independent reasons: 

1. to keep control of rj on subintervals where the frequencies are not well separated 
and where r] may thus deviate from its near-constant value; 

2. to obtain higher order of approximation on intervals with separated frequencies. 

We simplify the following presentation by assuming that the potential U is quadratic: 

U(q,t) = i q T G(t)q, 

with a symmetric matrix G(t) depending smoothly on t. We leave the required modi¬ 
fications for general U to the interested reader. Alternatively, the method with (7 = 0 
can be used in the splitting approach of Sect. XIV.2.3 below. 

An adiabatic integrator as described in Sect. XIV. 1.2 can be extended to (2.23) 
and combined with a symmetric splitting between the weakly coupled systems 
(2.21) and (2.23): we begin with a symplectic Euler half-step for p 05 Qo (denoting 
the time levels by superscripts), 
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vT = rf-|(Wo /2 + (5oo + r 0 T GToK 

+ e(Soi + r 0 T GTi) Sr?? 0 ) (2.30) 

% /2 = <& + \(pT + L’U+elZ a Qrf). 

Here the matrix functions L 00 , Lio, Soo* 5oi, Tq, Ti are evaluated at t 1 / 2 = to + 
/i/ 2 , and is the average of the oscillatory function Qi of ( 2 . 20 ) over the half¬ 
step, 

2 l ‘ tl/2 

Qi-^J Qi{t)dt, 

obtained with a linear approximation of the phase <fr(t) and analytic computation of 
the integral. We then make a full step for 77 with Eq. (2.23) like in (1.12), 

v 1 = V ° + hf ($)•!• W^j 1 -{r 1 1 +rf) 

- hV r ( L wP y 2 + (5 10 + ifGT 0 )^ /2 ), (2.31) 

where again all matrix functions are evaluated at t 1 / 2 , and Pi is the linear-phase 
approximation to the average 


1 C tx 

Pi « - / Pi(t)dt. 

n Jto 

The matrix VF is as in (2.22), but with Su replaced by Sn + GT\ . The step is 
completed by a half-step for p 0 , g 0 with the adjoint symplectic Euler method: 

Pl = pl /2 -\(L ooP l /2 + {S 0Q +TZGT 0 )ql 

+ e(S 01 + if GT^Q+jj 1 ) (2.32) 

= % /2 + | (po / 2 + ^oo9o + zL{ 0 Ql t? 1 ) , 

where the matrix functions are still evaluated at t 1 / 2 , and Q+ approximates the 
average of Qi over the second half-step. 

We now give local error bounds for this integrator, under conditions that include 
the case of an avoided crossing of frequencies. 

Theorem 2.2. Suppose that the functions in (2.1) are smooth and the frequencies 
satisfy (2.24) with minimal distance S > 0 for to < t < to + h, and the orthogo¬ 
nal matrix Q*(t) of (2.7), which diagonalizes the nonsingular part of the stiffness 
matrix, has derivatives hounded hy Q*(t) = 0(S~ 1 ), Q*(t) = 0(S~ 2 ). Assume 
further the energy hound (2.2) for the initial values. Then, the local error of method 
(2.30)-(2.32) is hounded hy 



XIY.2 Mechanical Systems with Time-Dependent Frequencies 549 


pl-Po(to + h ) = 0(h 3 /5 2 ) + 0(eh 2 /5 2 ) 

qo~Qo(to + h) = 0(h 3 /6) + 0(eh 2 /6 2 ) 

V 1 ~ v(to + h) = 0(h 2 /5 2 ). 

The constants symbolized by O do not depend on e, h, and 5. 

Proof, (a) Under the given conditions we have 

^00 =0(1), K 0 i = 0(1), K 10 = O( 1), K u = 0(6-'), and 

K 00 = O(l), Koi&OiS- 1 ), k 10 = O(5~ 1 ), k u =o(6- 2 ), 

This yields the bounds 

£oo, Lio, Soo, Su = (9(1) 
and similarly for their derivatives, and 

Lu,S 0 i,S 10 = 0(6 - 1 ), in ,Soi,S w = 0(S~ 2 ), 

and hence also 

W = 0(S~ 1 ), W = 0(S~ 2 ). 

So we have from the energy bound and the differential equation (2.23) for 77 , 

V = 0(1), V = 0(6~ 1 ). 

From the differential equations (2.21) for p 0l g 0 we conclude 

Po = 0(5~ 1 ) + 0(ed- 2 ), q 0 = O^S- 1 ). 

(b) To study the local error in 77 , we integrate (2.23) from t 0 to £ 0 +^ and compare 
with the corresponding term in (2.31): 

J t ° + Pi 0 1 ) (l wPo + (S w + Tf GT 0 )q 0 ) (t) dt 

- hV{ f 10 (t 1/2 )pl /2 + (S w + 3fGT 0 )(i 1/2 )gJ /2 ) 

= 0(h 2 /6 2 ), 

where we have used the above bounds and the error estimate for the linear phase 
approximation in the average of Pi(t), cf. Sect. XIV. 1.2, 

Vl -\[ lp kt)dt = 0(h/6). 

Combining this estimate with the error bound of the adiabatic midpoint rule for the 
homogeneous equation as given in Theorem 1.2 yields the stated error bound for 771 . 
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(c) The error bound for the components po, qo comes about by combining error 
bounds for the Stormer-Verlet method (which require the bounds for po, Qo) and the 
estimates 


nto~\-h/2 


'to 


£(Soi + T^GTOQMt) dt- ± e(s 01 + r 0 r GT 1 )(t 1/2 )Qr^° 

= 0{sh 2 /6 2 ) 


and 


r 

Jto 


to~\~h/2 


£ Li 0 Qirj(t) dt-'A eL 10 (t 1/2 )Qrr = 0(eh 2 /S) 


and the same estimates for the second half-step. See also Exercise 7 for a similar 
situation. □ 


In the case of well-separated eigenvalues, the global error on bounded time intervals 
is thus bounded by 0(h 2 ) + 0(he) in p 0 , Qo for t < Const, and by 0(h) in rj. In the 
original variables p, q of (2.1), this then yields an error 

q n ~ q(t n ) = 0(h 2 ) + 0(he), p n - p(t n ) = 0{h) for t n < Const. 

With an adaptive step size strategy as in Sect. XIV. 1.2, it is again possible to follow 
7 ) through non-adiabatic transitions near avoided crossings of eigenvalues. 

A higher-order scheme with a global error of 0(h 2 ) in ij - in the situation 
of separated eigenvalues - is obtained by replacing the upper line in (2.31) by a 
second-order adiabatic integrator as discussed in Sect. XIV. 1.2, leaving the last term 
in (2.31) unaltered. In the original variables p, q of (2.1), the error is then 0(h 2 ) 
both in positions and (fast and slow) momenta. The error is even 0{eh 2 ) in the fast 
positions qi of (2.8), which oscillate with an amplitude 0(e). We refer to Lorenz, 
Jahnke & Lubich (2005) for the particular case of second-order differential equa¬ 
tions q + e~ 2 A{t)q = 0 with a positive definite matrix A(t). 


XIV.2.3 Error Analysis of the Impulse Method 

The transformation to adiabatic variables of Sect. XIV.2.1 also gives new insight 
into the error behaviour of multiple time stepping methods such as the impulse or 
mollified impulse method discussed in Sections VIII.4 and XIII. 1, which do not 
use coordinate transforms in the method formulation. These methods are of interest 
when the eigendecompositions needed in adiabatic integrators are computationally 
more expensive than doing many small steps with the fast subsystem, and when 
evaluations of the potential force are so costly that the computational work for the 
fast subsystem becomes irrelevant. We consider the splitting 

jgj - _ j^-fast _|_ j^-slow 


of the Hamiltonian (2.3) with 
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H iast (p, E, q, t) = i p T M(t)- 1 p+^q T A(t)q + E 
H sl ™(p,E,q,t) = U(q,t). 

The impulse method is given as the composition of the exact flows of the subsystems 
(see Sections VIII.4 and XIII. 1.3): 

.n , slow _ ,Tast _ ,-slow 

= <P h /2 0( Ph ° <Ph/2 > 

where we are interested in taking long time steps h> ce (with a positive constant c). 
The equations of motion of the slow subsystem, 

p= -WU(q,t), 4 = 0, i = 0, 


are solved trivially by 


P=P~^U(q,t), q = q, t = t. 

In contrast, the fast subsystem needs to be integrated approximately, e.g., by many 
small substeps with the Stormer-Verlet method in the original variables (p, q) or by 
one step of the method (2.30)-(2.32) with G — 0 in adiabatic variables (po,qo,rj). 
In the following we ignore the error resulting from this additional approximation 
and study the splitting method with exact flows. 

The error behaviour of this method can be understood with the help of the trans¬ 
formation to adiabatic variables of Sect. XIV.2.1. The impulse method in the adia¬ 
batic variables po, Qo, P is obtained by splitting the differential equations (2.21) and 
(2.23). The fast subsystem is obtained by simply putting U = 0 in these equations, 
and the slow subsystem reads 

Po = —Tq VU(T 0 q 0 + eTi<2i?7,f), q 0 = 0 
V = -PZTfVUiToqo+s^Q^t) 

along with i = 0, so that the argument in all the matrices is frozen at the initial time. 
Here P\(t) and Qi(t) are again the highly oscillatory matrix functions of (2.20). 
Since QiPf = 0 we have Qirj = Const., and therefore, in these variables the flow 
pf™ is the mapping given by 

Po = Po - | Tq VU (Togo + sTiQiV, to), qo = Qo 

V = r } -^Pi'lTVU(T 0 qo+eT 1 Q 1 n,to), (2.33) 

where the matrices T 0 , Ti, Pi, Q\ are evaluated at to- In the impulse method, the 
above values are the starting values for a step with p^ ast , which is followed by 
another application of pf™- 

A disturbing feature in (2.33) is the appearance of the particular value Pi (to) of 
the highly oscillatory function instead of the average V\ as in (2.31). 
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We now consider the error propagation for p in the case of well-separated fre¬ 
quencies. Recall that the exact solution then satisfies rj(t) = rj( 0) + 0(e) for 
t < Const. For ease of presentation we consider a constant step size h. 

Lemma 2.3. Assume the energy bound (2.2) for the initial values. If the frequencies 
ujj (t) remain separated from each other, then the result after n steps satisfies, for 
nh <T < Const., 

Vn = 'n0 + Vn + O(£), (2.34) 

where 


\\a n \\ <Cn with n= max max 

0 <nh<T k 

j =0 

Proof. We have r] n = ph{t n ), where Ph(t) solves the differential equation with 
impulses, 

f]h = exp(--$) W exp(-<p) r/h + r + y2 $j- 

3 

Here W (t) is the matrix (2.22) appearing in (2.23), and 

r(t) = -Pi(t)(L 10 (t)p 0 ,h(t) + S 0 i(t)q 0 ,h(t)) 

with po^h(t), qo,h(t) denoting the piecewise constant functions that take the values 
of the numerical solution. Further we have 


h J2 ex V{s 


pMtj)) 


(2.35) 


Ar/j = —hPi(tj)* Ti(tj) T VU(T 0 {tj)q 0 j +eT 1 (t j )Qi(t j )T) j ,t j ), 

the expression on the right-hand side of (2.33), and Sj is a Dirac impulse located 
at tj. It follows that, for t = nh, 

Vn~V0 = Vh(tn)~Vh( 0) 

exp(—^<P(s)) W(s) exp^<?(s)) rjh(s) ds + J r(s) ds + cr n , 

where cr n is the trapezoidal sum of the terms on the right-hand side of (2.33): 


<? n = -foY' Pi(tj)* Ti(tj) T X7U ( T 0 (tj)q 0 j + eTi(tj)Qi(tj)rij,tj). (2.36) 
j =0 

The prime on the sum indicates that the first and last term are taken with the factor \. 
Using partial integration as in (1.6), we obtain 


exp 


(-^#(s)) W(s) exp(^(s)) T] h (s) ds = 0(e), 


and by partial integration as in (2.25), 
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f r(s)ds = 0(e). 

Jo 

This shows (2.34). A partial summation in (2.36), summing up the oscillatory terms 
Pi (tj) and differencing the smoother other terms, then yields (2.35). □ 

The size of n of (2.35) depends on possible resonances between the step size and 
the frequencies, yielding n between 0(h) and 0(1). For the error of the method we 
have the following. 

Theorem 2.4. Assume the energy bound (2.2) for the initial values. If the frequen¬ 
cies ujj(t ) remain separated from each other, then the error of the impulse method 
after n steps with step size h > ce satisfies 

Pn ~ P(tn) = 0(k) 

Qn ~ q(t n ) = 0 (h 2 ) + 0 (ek). 

The constants symbolized by O do not depend on e, h and n with nh < Const. 

Proof. The error of size 0(n) in p immediately implies an error of size O(k) in the 
actions Ij = \ \rjj | 2 , and an error of O(k) in the fast momenta p\ and of 0{ek) in 
the fast positions q\ of (2.9); recall the transformation (2.16) and the rescaling. In 
the slow components po , qo the method is a perturbed variant of the Stormer-Verlet 
method. The contribution of the perturbations eT\Q\p to the error is of size 0(en). 
This is seen by applying the simple lemma below with y = (p 0 , qf) and 

d = (^hTQ(t n ) T V 2 U(To(t n )qQ,n,t n )£Ti(t n )Qi{t n )p n '\^ + QQ l 2 £ ^ 

and using partial summation of the d n , summing up the oscillatory terms Qi(t n ) 
and differencing the other terms. □ 

Lemma 2.5. Let d>h(y) = y+hFh(y) be a one-step method where has Lipschitz 

constant L. Consider the method and a perturbation, 

Vn+i = &h(y n ) and y n+ i = $ h (y n ) + d n , 

with the same starting values t/o — yo- Then, the difference is bounded by 

k 

II Vn ~ J/nll < e nhL • max II ^ d i • 

0<k<n— 1 II *—' 

3—0 


Proof. The result follows from 


n—% n —1 

Vn - Vn = ( Fh{yn ) - F h (y n )) + X d J 

3=0 3=0 

with the discrete Gronwall inequality. 


□ 
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XIV.2.4 Error Analysis of the Mollified Impulse Method 


The problem with possible step-size resonances can be greatly alleviated by the 
mollified impulse method (see Sect. XIII. 1.4) where the potential U (g, t) is replaced 
by a modified potential U (g, t). A good choice is 


U{q,t) = U(A(t)q,t) with A{t)=C(t)Q(t)(j Q ) Q(t) T C(t)~ l 

(2.37) 

with C and Q of (2.4) and (2.6), and 

S{t) = sine — = — J exp^±— i?(t)^ ds. 

A calculation shows that it replaces (2.33) by 

Po = Po-^TqVU (Togo + Qm, t 0 ), qo = qo 

V = V-^nTfVUiToqo + sT^^to), (2.38) 

with matrix functions evaluated at to, where V\ (t) and Qi(t) are the linear-phase 
approximations to the average over the interval [t — h,t + h\ of Pi and Q i, respec¬ 
tively, 


i rt-\-h 

Vi (t) = S(t)P 1 (t) = — Pi (s) ds + 0(h) 
zn Jt-h 

i rt-\-h 

Qi(t) = <S(t)Qi(t) = — / Q ± (s) ds + Oih). 

Ztl Jt-h 

Therefore, (2.34) and (2.36) hold with the highly oscillatory P\{tj) replaced by the 
averages Using a partial summation in (2.36) and noting that, for t = nh < 

Const., 


hJ2ri(h) 

3 = 1 


/ 


P 1 (s)ds\\+0(h) = 0(s) + 0(h), 


we obtain an estimate 

Vn = Vo + 0(h) 

instead of the corresponding bound (2.34) with (2.35). This eliminates the bad ef¬ 
fect of step size resonances (large k) on the propagation in the fast variables over 
bounded time intervals t < Const, (though not on longer intervals, as we know from 
Chap. XIII). The more harmless effect of step size resonances on the slow variables, 
as visible in the term 0{ek) in Theorem 2.4, is likewise reduced to 0(eh). We thus 
obtain the following improvement over the error bounds in Theorem 2.4. 
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Theorem 2.6. Assume the energy bound (2.2) for the initial values. If the frequen¬ 
cies ujj (t) remain separated from each other, then the error of the above mollified 
impulse method after n steps with step size h > ce satisfies 

Pn~P(tn ) = 0(h) 

q n -q{t n ) = 0(h 2 ). 

The constants symbolized by O do not depend on e, h and n with nh < Const. □ 

A direct implementation of this method requires just the same matrix decompo¬ 
sitions that are needed for the integrators in adiabatic variables. It is then reasonable 
to use one step of the adiabatic integrator of Sect. XIV.2.2 for solving the fast sub¬ 
system over a time step. 

An alternative is to compute the average A(t) by small time steps from the linear 
differential equation with the Hamiltonian i7 fast , as formulated in Sect. XIII. 1.4. 
The method described here then corresponds to (XIII. 1.18) with c = 1. 

XIV.3 Mechanical Systems with Solution-Dependent 
Frequencies 

We 2 consider the Hamiltonian 

H(p,q) = ±p T M(q)- l p + U(q) + 2v(q) (3.1) 

with a strong potential e~ 2 V (q) that penalizes some directions of motion. Analytical 
studies of this problem were done by Rubin & Ungar (1957), Takens (1980), and 
Bornemann (1998). In an alternative approach to these works, we here describe a 
transformation of the problem to adiabatic variables. This gives new insight into the 
solution behaviour and can be used as the starting point for the construction of long¬ 
time-step integrators. It also enables us to analyse the error of multiple time-stepping 
methods. 

XIV.3.1 Constraining Potentials 

We consider the Hamiltonian (3.1), where M(q) is a symmetric positive definite 
mass matrix depending smoothly on the positions q G M n , U is a smooth potential, 
and the constraining potential is assumed to satisfy the following: 

2 This section was written in cooperation with Katina Lorenz (Doctoral Thesis, 
Univ. Tiibingen, in preparation). 
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The smooth function V : D C M n —► M at¬ 
tains its minimum value 0 on a d-dimensional 
manifold Vcf n , 

V = {q G D | V(q) = min V = 0}. (3.2) 

In a neighourhood of V, the potential V 
is strongly convex along directions non- 
tangential to V, that is, there exists a > 0 
such that for q G V, the Hessian \7 2 V(q) sat¬ 
isfies 

v T \7 2 V(q)v > a • v T M(q)v (3.3) 

for all vectors v in the M(q) -orthogonal complement of the tangent space T q V. 

We let m = n — d be the number of independent constraints that locally describe 
the manifold V. 

Example 3.1 (Chain of Stiff Springs). The position of m + 1 mass points in a 
plane, arranged in a chain connected by stiff springs with spring constants a 2 /e 2 , is 
determined by the Cartesian coordinates of the first mass point and by m angles tpi 
and the elongations d{ of the m springs. The constraining potential is 

rn 

V=\Y j a 2 i dl 

i =1 

and the constraint manifold is described by di = ... = d m = 0 corresponding to 
non-elongated springs. The frequencies of the vibrations in such a chain depend on 
the angles. 

In the above example we have, in the coordinates given by the angles and elon¬ 
gations, a potential V of the form 

V(q) = ^q[A(q 0 )q 1 (3.4) 

for q = (qo,qi) G x M m , with a positive definite matrix A(qo). The manifold 
of constraints is here simply V = R d x 0. As the following lemma shows, this is 
already the general situation in suitable local coordinates. 

Lemma 3.2. Under conditions (3.2)-(3.3), there exists a smooth local change of 
coordinates q = x{u) suc h th at 

V(q) = i ylA(y 0 )yi for q = x(y) 

with y = (?/o ? yi) near 0 in ~R. d x M m , where A(yo) is a symmetric positive definite 
m x m matrix. 
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Proof. In a first step, we choose local coordinates q = f>(x) with x = (xo, x\) near 
0 in R d x M m , such that q = f>(x) G V if and only if = 0. In these coordinates, 
denoting V(x) = V(q) for q = f>(x), we then have 


V(x Q ,0) = 0, VF(%0) = 0 


by (3.2), and 

A(xq) := Vh V (xo 5 0) is positive definite 
by (3.3). We now change coordinates by the near-identity transformation 


Vo = x 0 , yi = n(x)xi 

where the real factor /j,(x) (near 1 for x\ near 0) is to be chosen such that 

lyfA(yo)yi = v(x 0 ,x 1 ). 

Since the right-hand side equals 

V{x 0 ,x 1 ) - r(a; o ,0) - x±VV(x o ,0) = ^xjA(x 0 )a;i +r(x) 
with r{x) = (9(||xi|| 3 ), the choice 


P(x) = 



2 r(x) 

xjA(x 0 )x i 


does the trick. □ 

We remark that Lemma 3.2 could be obtained as a corollary to the Morse lemma, 
for which we refer to Abraham & Marsden (1978) and Crouzeix & Rappaz (1989). 

The change to the local coordinates x = (xo, x\) such that V(q) = 0 if and only 
if x\ m 0 for q = f>(x), is not numerically constructive from the mere knowledge 
of an expression for the potential V. However, in many situations the manifold V 
can be described by constraints g(q) = 0, and x\ = g can then be extended to a full 
set of coordinates. The above transformation from x to y can be done numerically. 
In the usual way, the transformation q = x{y) of the position coordinates extends 
to a canonical transformation by setting p y = x'{y) T P f° r the conjugate momenta; 
see Example VI.5.2. 

Solutions of (3.1) are in general oscillatory with frequencies of size ~ £ -1 . 
There exist, however, special solutions having arbitrarily many time derivatives 
bounded independently of 5, which for arbitrary N > 1 stay 0(e N ) close to a man¬ 
ifold V e,Ar that has a distance 0(e) to V. See Lubich (1993), where also implicit 
Runge-Kutta methods for the approximation of the smooth solutions are studied. In 
this section we are, however, interested in approximating general oscillatory solu¬ 
tions of bounded energy. 
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XIV.3.2 Transformation to Adiabatic Variables 

We start from a Hamiltonian (3.1) in coordinates (p, q) where the constraining po¬ 
tential is already of the form (3.4) for q = (go, gi). We note that for a system of 
bounded energy, we then have qi = 0(e). 

We now perform a series of canonical transformations that take the Hamiltonian 
into a form that is better suited for a direct numerical treatment and for the er¬ 
ror analysis of multiple time-stepping methods. The transformations are similar to 
those for the time-dependent case treated in Sect. XIV.2.1, but here they appear in a 
permuted order. 

Transforming the Stiffness Matrix into the Identity. We write the Cholesky de¬ 
composition of the stiffness matrix as 

Mqo) = C(go) _T C , (go ) _1 


and change to variables 

Qo=Qo, Qi= C(q 0 )qi 

along with the conjugate momenta 


Po = Po + 



Pi = C(q 0 ) T pi. 


With the transformed mass matrix M(q) = B(q)M(qo,C(qo)qi)B(q) T (for the 
matrix B(q) that transforms p = B(q)p) and the potential U(q) = U(qo,C(qo)q±), 
the Hamiltonian takes the simplified form (we omit all tildes) 

H = \p T M(q)~ 1 p+ q[qi +U(q). (3.5) 


Eliminating Off-Diagonal Blocks in the Mass Matrix. We write the mass matrix 
M(q) as 

/M 00 M 0 i\ 

\ Mio M u )' 

With G(q 0 ) = —Moo(q 0 , 0) _1 M 0 i(g 0 , 0). we transform 
go = g 0 + G(g 0 )g 1 , gi = g x , 


with the conjugate momenta 


Po = Po + 



Pi =Pi + G(g 0 ) T p 0 . 


This canonical change of variables eliminates Mqi and Mio in the transformed mass 
matrix M(g 0 , 0) and keeps the Schur complement on the block diagonal: with the 
symmetric positive definite matrices 



XIV.3 Mechanical Systems with Solution-Dependent Frequencies 559 


M 0 (g 0 ) — Moo(q 0 ,0), Mi(q 0 ) — (Mu - M^M^ 1 M 01 ) (g 0 , 0), 

the transformation puts the Hamiltonian into the form (we omit all bars) 

H = ^PoM 0 (qo)~ 1 Po + \p{M 1 (q 0 )~ 1 p 1 + X gfgi 

+ ^P T R(q)p+U(q 0 + G(q 0 )qi,qi) (3.6) 

where R is a smooth matrix-valued function satisfying 


R(qo, o) = o. 


(3.7) 


Diagonalizing the Mass Matrix of the Fast Variables. We diagonalize 

Mi (g 0 ) = Q(qo)G(q 0 )~ 2 Q(q 0 ) T 


with the diagonal matrix f2(qo) = diag(cj J (go)) of frequencies and an orthogonal 
matrix Q(qo), which depends smoothly on go if the frequencies are separated. We 
transform 

qo = qo, qi = Q(qo)qi 

with the conjugate momenta 

( d \ T 

Po = po + Q(qo)qi) pi, pi = Q{qo) T pi- 


The matrix 

Y (q)= (^- Q (qo)qi) T Q(qo) 

is of size 0(qi) but it is this expression which may become large near avoided 
crossings of eigenvalues. We consider the associated matrix 


X(q) 


( 0 x m 

V^io Xu 


( 0 —Mq 1 Y \ 

( -y t Mq 1 y t Mq 1 y ) ' 


(3.8) 


With a matrix R(q) satisfying (3.7), which is a sum of the appropriately transformed 
previous matrix R and the above matrix X , the Hamiltonian in the new variables 
(p, q) becomes (we omit all hats) 


H = ^PoMo(go) 1 Po + ^pI^(qo) 2 Pi +^qlqi 
+ 2 P T R(q)p + U(qo + GQ(qo)qi, Q(qo)qi)- 


(3.9) 
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\p T R(q)p = £ 1/2 c(po,qo) T qi+pjL(p 0 ,q 0 ) T qi 

+ £~ 1 / 2 T(p 1 ,p 1 ,q 1 -,p 0 ,q 0 ) +p(p,q), ( 3 . 11 ) 

with a vector c, a matrix L , a function r that is trilinear in p\ , pi , qi , and a remainder 
of size p(p, q) = (9(e 2 ) forpi, qi = 0(e x ! 2 \ whose partial derivatives with respect 
to pi,qi are of size (D(s 3 / 2 ), and with respect to po> Qo of size 0(s 2 ). 

Equations of Motion. The differential equations now take the form 

Po = -^qo( K \poM o (q o )~ 1 po + U(qo,0)j 

- V go (T ^<?(</„)/>i + E ql + / 0 (P, 9) 

® = Moiqo^po +go(p,q) ( 3 . 12 ) 

fpA = \( 0 -O(qo) \fpi \ , (fi(p,q)\ 

\Qi J £\tt(q 0 ) 0 J \qi J \ 9 i(p,q) J 

with the functions 
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= ^ q (lp T R(q)p + U(T(q o )q)-U(qo,0)) 


(£) “ 

We note the magnitudes / 0 = 0(e:), go = 0(e) and /i = 0{e x l 2 \ g\ = ©(e 1 / 2 ) 
in the case of separated eigenfrequencies, where the diagonalization is smooth with 
bounded derivatives. By (3.11) we have (omitting the arguments po> Qo in c, L, T) 

fi = -e 1/2 c-Lpi + e~ 1/2 a(pi,pi;p 0 ,qo) - £ 1/2 T^\7U(qo,0) + 0(e 3/2 ) 

9i = L T q 1 +£~ 1/2 b(p 1 ,qi-,p 0 ,q 0 ) + O(£ 3/2 ) (3.13) 


where the functions a and b are bilinear in their first two arguments. 

The System in Adiabatic Variables. We finally leave the canonical framework 
and transform to adiabatic variables as in (2.16). Along a solution (p(£), q(t )) of the 
system (3.12) we consider the diagonal phase matrix $(t) defined by 

S = A(q 0 ) with **) =(«<»> _ /} ° (go) 

With the constant unitary matrix r of (2.14), which diagonalizes the matrix in 
(3.12), we introduce the adiabatic variables 


rj = e x / 2 exp( — - 


(-H r * 


and denote the inverse transform as 


I = e i/2 ( p i 

Qi 


QjV^e^rexp^r,. 


(3.14) 


(3.15) 


The differential equations (3.12) forpi, q\ then turn into 


f] = e 1 / 2 


exp(-Hp) F* (^j j = £- 1/2 PUi + e- l/2 Q\gi 


with the arguments (po, e^-Pip, qo> iv) in the functions fi,g%* Inserting the 
expressions for /i and g\ from (3.13), we obtain as in (2.22) and (2.23), with 


w 1 (L-L t L + L t 
"=-! Itl' L-L? 


(3.16) 


the differential equation 


V = exp(—^#) W(p 0 ) qo) exp (^<?) 77 


+ exp 


/ a{P\T}, Pir/;po, qo) 

\b(Ppi,Qiv;po,qo) 


-Pi 


(-;*) r 

T (c(po, go) + Ti(go) T Vf/(g 0 ,0)) 


with the remainder r(po, qo, Pi?), Qip) = O(e). 


(3.17) 

(3.18) 

(3.19) 
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Adiabatic Invariants. For a solution with bounded energy, both p\ (t) and q\ ( t ) in 
(3.12) are of size 0{e 1 / 2 ) and hence 

v(t) = 0(1). 

We now integrate both sides of the above differential equation from 0 to t. The 
integral of the terms in (3.19) is 0(e ), as is seen by partial integration since P* (t) 
is oscillatory with an 0(e) integral and po , go have bounded derivatives. 

We now suppose that the eigenfrequencies uj(t) := Uj(qo(t)) remain separated 
and bounded away from 0 : there is a constant S > 0 such that for any pair c Oj ( t ) and 
Uk(t) with j 7 ^ fc, the lower bounds 

\ujj(t) - L0 k (t) I > S, Wj(t) > ^ (3.20) 

hold for all t under consideration. In this situation, as in Sect. XIV.2.1, the integral 
from 0 to t of the term (3.17) is bounded by 0(e), since the matrix W has zero 
diagonal. 

It remains to study the term (3.18) with the bilinear functions a and b. This 
term has only oscillatory components if the following non-resonance condition is 
satisfied: for all j,k,l and all combinations of signs, 

\u j (t)±u k (t)±u l (t)\>S (3.21) 

with a positive S independent of 5 . In this case, also the integral over the term (3.18) 
is of size 0(e), and we obtain 

rj(t) = 77 ( 0 ) + 0(e) for t < Const. (3.22) 

If condition (3.21) is weakened to requiring that for all j,k,l = 1,..., m, 

ujj (t) ± uok (t) d= uji (t) has a finite number of at most simple zeros (3.23) 
in the considered time interval, then the estimate deteriorates to (see Exercise 1) 

rj(t) = 77 ( 0 ) + O^ 1 / 2 ) for t < Const. (3.24) 

The actions 

Ij = \Vj \ 2 (j = l,...,m) (3.25) 

are thus adiabatic invariants: 

Ij(t) = lj(0) + 0(e) for t < Const, 
in case of (3.22), and up to 0(^ 1 / 2 ) in case of (3.24). 


(3.26) 
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The Slow System. Since the oscillatory energy equals 

1 1 m 

— pin(q 0 )pi + — q[ (2(qo)qi = X (<?o), 

3 = 1 

the differential equations (3.12) for the slow variables po, Qo become, up to O(e), 

m 

PO = -Vq 0 Qpo M o(ao) _1 PO + t r (gO,0)) ^q 0 Uj(qo) 

3 = 1 

g 0 = M 0 (go) Vo- (3.27) 

Compared with the constrained system with Hamiltonian ^p T M{q)~ x p + [/ (g) on 
the configuration manifold V, the slow motion is thus driven by the additional poten- 
tial Ij Uj(qo) depending on the actions Ij. See also Rubin & Ungar (1957), 
Takens (1980), and Bomemann (1998) for different derivations and discussions of 
the correction potential. 

Avoided Crossing of Frequencies and Takens Chaos. If the distance S of fre¬ 
quencies in (3.20) becomes so small at a point qo(t) that S 2 < 6, then there can 
again occur 0(1) changes in adiabatic invariants Ij, as in the Zener example of 
Sect. XIV. 1.1. In the present situation of solution-dependent frequencies, however, 
the level to which Ij jumps after the avoided crossing, depends very sensitively on 
the slow solution variables qo(t) through the terms exp(zb^) in (3.17). In turn, the 
slow motion of po , go after the avoided crossing depends on the new values of Ij 
through (3.27). The effect is that the slow motion depends very sensitively on per¬ 
turbations of the initial values in the case of an avoided crossing; see Takens (1980). 
The indeterminacy of the slow motion in the limit £ —> 0 is termed Takens chaos by 
Bomemann (1998). 


XIV.3.3 Integrators in Adiabatic Variables 

A long-time-step integrator for the oscillatory mechanical system with Hamiltonian 
(3.1) can now be obtained as follows: 

Solve the slow system (3.27) in tandem with applying an adiabatic integrator 
(see Sect. XIV. 1.2) to a simplified equation for the adiabatic variables, 


f] = exp^— exp^-^77, 

where W is given by (3.16) with a simplified matrix L : with vq = Mo (go) - Vo, let 
L(po,qo) = -ft{qo) 1/2 4- Q(qo + TV 0 ) T Q(qo) f2(q 0 )~ 1/2 . 

dr T = 0 

This matrix L captures the principal terms, coming from the matrix X 01 in (3.8), 
which are responsible for a change of the adiabatic invariants due to an avoided 
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crossing as long as the frequency separation condition (3.20) holds with a possibly 
e-dependent S e, e.g., with S ~ e 1 / 2 where 0(1) changes occur in the adiabatic 
invariants. Because of the Takens chaos, it cannot be expected that such an integrator 
yields a good approximation to “the” solution, but the method can approximate 
an almost-solution (having a small defect in the differential equations) that passes 
through the avoided crossing zone, and it detects the change of adiabatic invariants. 
The properties of integrators of this type are currently under investigation (Lorenz 
& Lubich 2006). 

Further we refer to Jahnke (2003, 2004b) for the construction and analysis of 
adiabatic integrators for mixed quantum-classical molecular dynamics, where simi¬ 
larly a nonlinear coupling of slow and fast, oscillatory motions occurs. 


XIV.3.4 Analysis of Multiple Time-Stepping Methods 

The error behaviour of the impulse and mollified impulse method applied to an os¬ 
cillatory Hamiltonian system (3.1) with well-separated frequencies can be analysed 
in the adiabatic variables in the same way as we did in Sections XIV.2.3 and XIV.2.4 
for the case of time-dependent frequencies. Analogous formulas and the same con¬ 
clusions hold; essentially we need to replace the argument t by qo in the appearing 
functions. However, their behaviour in the situation of an avoided crossing with 
Takens chaos is presently not understood. 


XIV.4 Exercises 

1. Show that 

J exp^- cj)(s)^Jds = 0(5 1 ^ m+1 ^ ) ) 

if A := <j) has finitely many zeros of order at most m in the interval [0, t]. 

Hint: Use the method of stationary phase ; see, e.g., Olver (1974) or van der 
Corput (1934). 

2. Show that the adiabatic variables rj(t) of (1.4) remain approximately constant 
also in the following cases of non-separated eigenvalues: 

(a) a multiple eigenvalue A j(t) of constant multiplicity m for all t and the 
orthogonal basis u J? i(t),... ,u J?m (£) of the corresponding eigenspace chosen 
such that the derivatives Vjj(t) are orthogonal to the eigenspace for all t; 

(b) a crossing of eigenvalues, A y(t*) = A &(t*) with A j(t*) ^ A &(t*), for which 
the eigenvectors are smooth functions of t in a neighbourhood of t*; see also 
Born & Fock (1928) for crossings where A j — A& can have zeros of higher 
multiplicity. 

3. Let the differential equation (1.1) with smooth skew-hermitian Z(t) be trans¬ 
formed locally over [to, to + h\ to z(t) = exp (—^Z*)y(t), so that 
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z = ^ exp(—^Z*) (Z(£) - Z*) exp(^-Z*) z 
with Z* = Z(to + h/2). Consider the averaged midpoint rule 

1 S ~ s 1 

*1 = + - / exp(--Z*) (Z(s) - Z*) exp(-Z*) ds -(z 0 + 2 : 1 ), (4.1) 

where Z(t) is the quadratic interpolation polynomial through Z(to), Z*, Z(t\). 
Show that the local error z\ — z(ti) is of size 0(h A /£ 2 ), which is 0(h 2 ) only 
for h = 0(e). Explain why the error bound cannot be improved to 0(h 2 ) for 
h = 0 (£ a ) with a < 1. 

Hint: See the proofs of Theorems 2.1(i) and 3.1 in Hochbruck & Lubich 
(1999b), cf. also Iserles (2004). 

4. In the situation of the previous exercise, let U be a unitary matrix of eigenvec¬ 
tors of Z*, and let D(t) be the diagonal matrix containing the diagonal entries 
of U* ( Z(t ) — Z*)U. Find a modification of the above averaged midpoint rule 
by terms that use only D(t), such that the local error is 0(h 2 ) for h < £ 3 / 4 if 
the eigenvalues of Z * are all separated by a distance S independent of 5 . 

5. Compare the error behaviour of the averaged midpoint rules (1.12) and (4.1) 
near the avoided crossing of the eigenvalues in the Zener matrix (1.9). 

6. Formulate symmetric modifications of the adiabatic integrators (1.12) and 
(1.13) that use function evaluations at the grid points t n and t n +1 instead of 

^n+l/2- 

7. Consider the differential equation y = f(y) + g(t) with a smooth function 
f(y ) and a function g(t) = 0(1 ) with g(t) = 0(S~ 1 ) with respect to a small 
parameter 5. For the modified midpoint rule 

yi =*yo + hf( — + J t 9(t)dt, 

show that the local error satisfies y\ — y(t\) = 0(h 3 /S). 

8. Write the Hamiltonian system (XIII.9.2) in adiabatic variables and relate this to 
the first terms of the modulated Fourier expansion. 

9. Compare the impulse method of Sect. XIV.2.3 with the method based on the 
splitting 

H = (i p T M(t)~ 1 p + ^ q T A(t)q ) + (u(q, i) + e). 

10. Show that Theorem 2.6 remains valid for the choice S(t) = 0 in (2.37). This 
corresponds to the projection to the constraint manifold in the mollified impulse 
method as proposed by Izaguirre, Reich & Skeel (1999). 



Chapter XV. 

Dynamics of Multistep Methods 


Multistep methods are the basis of important codes for nonstiff differential equa¬ 
tions (Adams methods) and for stiff problems (BDF methods). We study here their 
applicability to long-time integrations of Hamiltonian or reversible systems. 

This chapter starts with numerical experiments which illustrate that the long¬ 
time behaviour of classical multistep methods is in general disappointing. They ei¬ 
ther behave as non-symplectic and non-symmetric one-step methods, or they ex¬ 
hibit undesired instabilities (parasitic solutions). Certain multistep methods for sec¬ 
ond order equations or partitioned multistep methods, however, have a much better 
long-time behaviour. They are promising methods, because in a constant step size 
mode they can be easily implemented, and high order can be obtained with one 
function evaluation per step. We characterize such methods by studying their under¬ 
lying one-step method, their symplecticity, their conservation properties, as well as 
their long-term stability. 


XV.l Numerical Methods and Experiments 

We present the numerical methods treated in this chapter, and in numerical experi¬ 
ments we look at their behaviour on Hamiltonian systems. 

XV. 1.1 Linear Multistep Methods 

For first order systems of differential equations y = f(y ), linear multistep methods 
are defined by the formula 

k k 

a jHn+j = h ^ (3j f (y n +j ), (1-1) 

i=0 3=0 

where aj, (3j are real parameters, 0, and |ao| + |/?o| > 0. For an application 

of this formula we need a starting procedure which, in addition to an initial value 
y(t 0 ) = yo, provides approximations y u ... ,yk-i to y(t 0 +h ),..., y(t 0 +(k-l)h). 
The approximations y n to y {to + nh) for n > k can then be computed recursively 
from (1.1). In the case ft = 0 we have an explicit method, otherwise it is implicit 
and the numerical solution y n +k has to be computed iteratively. 
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Since the fundamental work of Dahlquist 
(1956) it is common to denote the generating 
polynomials of the coefficients by 

k k 

3=0 3=0 

For the classical theory of multistep meth¬ 
ods we refer the reader to Chap. Ill of Hairer, 
Nprsett & Wanner (1993). We just recall 
some important definitions. 

Order. A multistep method has order r if, 
when applied with exact starting values to 
the problem y = t q (0 < q < r), it integrates 
the problem without error. This is equivalent 
to the requirement that 

p(e h ) - ha{e h ) = <D{h r+1 ) for h -► 0. (1.2) 

Stability. Method (1.1) is stable if, when applied to y = 0, it yields for all 
2/o? • • • ^ yk-i a bounded numerical solution. This is equivalent to the requirement 
that the polynomial p(() satisfies the root condition, i.e., all roots of p(£) = 0 sat¬ 
isfy |Cl < 1, and those on the unit circle are simple roots. The method is called 
strictly stable , if all roots are inside the unit circle with the exception of ( = 1. 

Convergence. If a multistep method is stable and of order r > 1, it is convergent 
of order r for all sufficiently smooth problems. This means that, assuming starting 
approximations with an error bounded by G{h r ), the global error satisfies y n — 
y(to + nh) = 0(h r ) on compact intervals nh < T. 

Symmetry. If the coefficients of a multistep formula (1.1) satisfy 

a k -j = -Oij, P k -j = Pj for all j, (1.3) 

then the method is called symmetric. Condition (1.3) implies that for every zero ( 
of p(C) also its inverse C _1 is a zero. Hence, for stable symmetric methods all zeros 
of p(C) are simple and lie on the unit circle. 

Example 1.1. We consider the pendulum equation (1.1.13), and we apply the fol¬ 
lowing multistep methods: the 2-step explicit Adams method 

Hn+2 = Vn+l H" fn+l ~~ 2^ n )' > (i-4) 

the 2-step backward differentiation formula (BDF) 

3 1 

2^/n+2 2t/ n+ i -j- —y n = hf n + 2 , (1-5) 

and the (2-step) symmetric explicit midpoint rule 
1 Germund Dahlquist, bom: 16 January 1925 in Uppsala (Sweden), died: 8 February 2005. 
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Fig. 1.1. Solutions of the pendulum problem (1.1.13); explicit Adams with step size h = 
0.5, initial value (po,qo) = (0, 0.7); BDF with step size h — 0.5, initial value (po, qo) = 
(0,0.95); explicit midpoint rule with h = 0.4 and initial value (po, qo) = (1.1,0) 


Vn +2 = Vn + ! 1. (1.6) 

For all methods we take yi = yo + hfo as the approximation for y(f 0 + h). The 
results of the first 108 steps are shown in Fig. 1.1. We observe that the first two 
methods, as expected, behave similarly as the explicit and implicit Euler method 
(the numerical solution spirals either outwards or inwards). This will be rigorously 
explained in Sect. XV.2.1 below. However, as might not be expected, the symmetric 
method (1.6) does not behave like the implicit midpoint rule (cf. Fig. 1.1.4), it shows 
undesired increasing oscillations (parasitic solutions). 

After this negative experience with classical multistep methods, the obvious 
question is: are there multistep methods which have a long-time behaviour that is 
comparable to symplectic and/or symmetric one-step methods? 

XV.1.2 Multistep Methods for Second Order Equations 

Many important Hamiltonian systems are second order differential equations 

v = f(y ), (1.7) 

where the force / is independent of the velocity y. Introducing the new variable 
v = y, we obtain the system y = v, v = f(y) of first order equations. If we apply 
a multistep method (1.1) with generating polynomials p*(C) = o a j& an< ^ 

<t*(C) = X)jLo t0 this s y stem ’ we £ et 

k* k* k* k* 

X a i Vn +j = h X Pi Vn +i ’ X a *i Vn +i = Pj f (Vn+j) ■ 

j =o 3=0 3=0 3=0 


An elimination of the ^-variables then yields 
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k k 

X a iVn+j = h 2 X /VOm+A C 1 - 8 ) 

i=o i=o 

where k = 2fc*, p(C) = p*(C) 2 and cr(£) = cr*(£) 2 . We consider here methods 
(1.8) which do not necessarily originate from a multistep method for first order 
equations, and we denote the generating polynomials of the coefficients ay and (3j 
again by p(Q and cr(Q. From the classical theory (see Sect. III. 10 of Hairer, Nprsett 
& Wanner 1993) we recall the following definitions and results. 

Order. A method (1.8) has order r if its generating polynomials satisfy 

p(e h ) - h 2 a(e h ) = 0(h r+2 ) for h -► 0. (1.9) 

Stability. Method (1.8) is stable if all zeros of the polynomial p(() satisfy |£| < 1, 
and those on the unit circle are at most double zeros. Observe that for methods 
originating from (1.1) all zeros are double. The method is called strictly stable , if 
all zeros are inside the unit circle with the exception of ( = 1. 

Convergence. If a multistep method (1.8) is stable, of order r > 1 and if the starting 
values are accurate enough, the global error satisfies y n — y(to + nh ) = 0(h r ) on 
compact intervals nh < T. 

Symmetry. If the coefficients of (1.8) satisfy 

OLk-j = OLj, f3 k -j = 13j for all j, (1.10) 

then the method is symmetric. Again, for every zero ( of p(() the value £ -1 is also 
a zero. Hence, stable symmetric methods have all zeros of p(Q on the unit circle 
and they are at most of multiplicity two. 

Dahlquist (1956) noticed that double zeros of p(() on the unit circle can lead to 
an exponential error growth. Lambert & Watson (1976) analyzed in detail the appli¬ 
cation of (1.8) to the linear test equation y = —ou 2 y. They found that with symmet¬ 
ric methods for which p(Q does not have double roots on the unit circle other than 
£ = 1, the numerical solution remains close to a periodic orbit (for sufficiently small 
step sizes). For example, the Stormer-Verlet method y n +i ~ tyn + Vn-i = h 2 f n 
satisfies this property for 0 < hcj < 2 (see Sect. 1.5.2). The study of the long-time 
behaviour of symmetric methods (1.8) was then put forward by the article of Quin¬ 
lan & Tremaine (1990), where an excellent performance for simulations of the outer 
solar system is reported. 

Example 1.2. We consider the Kepler problem (1.2.2) with initial values (1.2.11) 
and eccentricity e = 0.2. We apply the following three methods with constant step 
size h = 0.01 on the interval of length 27r • 10 5 (i.e., 10 5 periods): 

(-^■) Z/n+4 2t/77,-|-3 ~I - Z/?i<-|-2 h 1-3 1-2 ~I - (-1 

(B) 2/n+4 2t/ n _|_2 + yn = h — (-3 g/n+2 T~ g/n+1^ 

(C) yn+4 — n+3 H - 2t/ n +2 — + y n = h ^ — /n+3 — g/n+2 g/n+1^- 
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Fig. 1.2. Error in the total energy for the three linear multistep methods of Example 1.2 
applied to the Kepler problem with e = 0.2 


All three methods are of order r = 4; method (A) is strictly stable, whereas methods 
(B) and (C) are symmetric. For method (B) the p-polynomial has a double root at 
C = —1, for method (C) it does not have double roots other than 1. Starting values 
yi,U 2 , and y% are computed very accurately with a high-order Runge-Kutta method. 

The error in the total energy is plotted for all three methods in Fig. 1.2. On 
the first 10 periods, all methods behave similarly and no error growth is observed. 
Beyond this interval, method (A) shows a linear error growth (as it is the case for 
non-symplectic and non-symmetric one-step methods), method (B) has an exponen¬ 
tial error growth, and for method (C) the error remains bounded of size 0(h 4 ) on 
the whole interval of integration. One of the aims of this chapter is to explain the 
excellent long-time behaviour of method (C). 

Stabilized Version of (1.8). Due to the double zeros (of modulus one) of the char¬ 
acteristic polynomial of the difference equation JA ajy n +j = 0, we have an un¬ 
desired propagation of rounding errors (especially for long-time integrations). To 
overcome this difficulty, we split the characteristic polynomial p(() into 

p(C) = pa(0 ■ pb(C), (i-ii) 

such that each polynomial 

kA ks 

pa(o = J2 a j A) ( j > mo = X a j B) o 

3 =0 3 =0 

has only simple roots of modulus one. Introducing the new variable hv n := 
JA ctj A ^y n +j, the recurrence relation (1.8) becomes equivalent to 

kA ks k 

= hv m ^2 a j B)v n+j = h ^2Pjfn+j- ( 1 - 12 ) 

3=0 3=0 j =0 

This formula, which for the Stormer-Verlet scheme corresponds to the one-step 
formulation (1.1.17), is much better suited for an implementation. If the splitting 
is such that p' A ( 1) = 1, the discretization (1.12) is consistent with the first order 
partitioned system y = v,v = f(y). 
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XV.1.3 Partitioned Multistep Methods 

Motivated by the stabilized version (1.12) of multistep methods for second order 
equations, let us consider general partitioned systems of differential equations 

y = f(y,v), v = g(y,v), (1.13) 

where, needless to say, y and v may be vectors. The idea is to apply different multi- 
step methods to different components. We thus get 

X a< j A) Vn+j = h J2 X a f )v n+j = h X Pj^Pn+j, (1-14) 

3=0 3=0 3=0 j =0 

where f n = f(y n , v n ) and g n = g(y n , v n )- We can take the same k for both meth¬ 
ods without loss of generality, if we abandon the assumption |a 0 | + \Po\ > 0. 

Such a method is of order r, if both methods are of order r. It is stable (strictly 
stable, symmetric, ...), if both methods are stable (strictly stable, symmetric, ...). 

Example 1.3. For our next experiment we use the symmetric methods 

(A) : y n + 3 — Vn +2 + Vn+1 — Vn — M/n+2 + /n+l) /1 1 cx 

(B) I ^n+3 ^n+l = e ^‘hg n -\-2’ 

Both methods are of order 2, and their p-polynomials 

Pa(0 = (C - 1)(C 2 + 1) and p B (<) = (C - 1)(C + 1) 

do not have common zeros with the exception of £ = 1. 




Fig. 1.3. Three versions of the methods (1.15) applied with step size h — 50 (days) to the 
outer solar system. For method (B) only the numerical orbits of Jupiter and Saturn are plotted. 
The time intervals are given in units of 10 000 days 
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We choose the outer solar system with the data as described in Sect. 1.2.4, and 
we apply the methods in three versions: (i) as partitioned method (AB), where the 
positions are treated by method (A) and the velocities by method (B); (ii) method 
(A) is applied to all components; (iii) method (B) is applied to all components. 
The numerical results are shown in Fig. 1.3. Whereas the individual methods show 
instabilities on rather short time intervals, the partitioned method gives a correct 
picture even with a large step size h — 50. 


XV.2 The Underlying One-Step Method 

Much insight into the long-time behaviour of multistep methods can be gained by 
relating their numerical solution to one-step methods. This then allows for an appli¬ 
cation of the considerations of the preceding sections. 


XV.2.1 Strictly Stable Multistep methods 

It was a surprising result when Kirchgraber (1986) proved that strictly stable multi- 
step methods are essentially equivalent to one-step methods. Although this one-step 
method is “quite exotic” (Eirola & Nevanlinna 1988), it is the key for a better un¬ 
derstanding of the dynamics of strictly stable methods. 

Theorem 2.1 (Kirchgraber 1986). Consider a strictly stable linear multistep 
method (1.1) applied with a sufficiently small step size h. Then, there exists a one- 
step method <T>h such that for starting approximations computed by yj = <P 3 h (yf), 
j = 1,..., k — 1, the numerical solution of (1.1) is identical to that obtained by the 
one-step method, i.e., y n + i = far all n > 0. 

Proof. The idea is to reformulate the multistep method (1.1) in such a way that the 
Invariant Manifold Theorem of Sect. XII.3 can be applied. To keep the notation as 
simple as possible, let us consider the case k = 3. 

We write the method in the form 

( 2/n+3 \ / &1 &0 \ f Un-\-2 \ / Fh (t/ n , J/n+1 •> J/n+2) \ 

Vn +2 = 1 0 0 2/n+l + M 0 (2.1) 

Vn+ 1 ) \ 0 1 0 J \ y n ) \ 0 / 

with ai = OLijoL \ c , and we transform the appearing matrix A to Jordan canonical 
form J = T -1 AT. We thus get 


1 

0 

0 \ 

f Vn+2 \ 


0 

dn 

di2 J Zn + hG(Z n ), 

Z n =T 1 2/n+l I . 

(2.2) 

\0 

d2i 

d22 J 

V Vn J 



Since the method is strictly stable, 1 is a simple eigenvalue of A, and all other 
eigenvalues are less than 1 in modulus. Consequently, the matrix D = ( d{j ) satisfies 
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11 D 11 < 1 in a suitable norm. Partitioning Z n = (£mVn) T into its first component 
£ n and the rest (collected in r] n ), we see that (2.2) is of the form (XII.3.1) with 
L xx , L xy , L yx of size 0(h), and L yy = \\D\\ < 1. Theorem XII.3.1 thus yields the 
existence of a function rj = s(£) such that the manifolds 



are invariant under the mappings (2.2) and (2.1), respectively. The function s(£) is 
Lipschitz continuous with constant A = O(h). 

Since the first column of T, which is the eigenvector corresponding to the 
eigenvalue 1 of A, is given by (1,1,1) T , the last component of T( s ^) satisfies 
y = £+#(£) where </(£) is Lipschitz continuous with constant 0(h). By the Banach 
fixed-point theorem this equation has a unique solution £ = r(y). Consequently, the 
manifold Mh can be parametrized in terms of y as 

M h = {(V h (y),$ h (y),y) T ;yeR d }. 

Its invariance under (2.1) implies that 

('&h(y)\ (-02 —o\ -«o\ f'Ph(y)\ (Fh(y,& h (y),'Z r h(y))\ 

I My) j = i o o I I $ h (y) 1 +h I o I 

and consequently y = <P h (y ) and $h(y) = F h (y), so that &h(y) = F 2 h (y). This 
holds for all y , and thus proves the statement of the theorem. □ 

Example 2.2. For a scalar linear problem y = Xy, the application of a multistep 
method yields a difference equation with characteristic polynomial p(() — 
Denoting its zeros by £i(^A),... ? ( k (h\), where Ci(0) = 1 an d |Cj(0)| < 1 for 
j > 2, the numerical solution can be written as (assuming distinct ( j(h \)) 

Un = c i Ci(hX) + C 2 C /2 (hXy + ... + c k Ck(hX). 

The coefficients c\ ,..., c& depend on hX and are determined by the starting ap¬ 
proximations 7 / 0 ? • • • ? Uk- 1 - I n this situation the underlying one-step method is the 
mapping t/o ^ (i(hX)yo. Observe that Ci( z ) i s i n general not a rational function as 
we are used to with Runge-Kutta methods. 

Remark 2.3 (Asymptotic Phase). For arbitrary 2 / 0 ? 2 / 1 ? • • •, yk-i close to the ex¬ 
act solution, there exists 2/0 suc h that the multistep solution {y n } and the one-step 
solution {2/n}> given by y^+i — ^ h(Vn )» approach exponentially fast, i.e., 

||2/ n — 2/nll — Const • p n for all n > 0 (2.3) 

with some p satisfying 0 < p < 1 (see Exercise XII.3). This is due to the attractivity 
of the invariant manifold Mh- A proof is given in Stoffer (1993), and it is based on 
techniques of Nipp & Stoffer (1992). This result explains why strictly stable linear 
multistep methods have the same long-time behaviour as one-step methods. 
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In the context of “geometric numerical integration” we are mainly interested in 
symplectic and/or symmetric methods which, for linear problems, are characterized 
by the condition Ci (— z)Ci ( z ) = 1 ( see Sect. VI.4.2). This, however, is only possible 
for symmetric multistep methods (Exercise 1) which cannot be strictly stable. 


XV.2.2 Formal Analysis for Weakly Stable Methods 

The proof and the statement of Theorem 2.1 break down as soon as at least one root 
of p(C), different from 1, has modulus one. Moreover, Example 2.2 shows that we 
cannot expect a property like (2.3) with p < 1. All we can hope for is to find an 
underlying one-step method as a formal series in h. Surprisingly, this provides a lot 
of insight into the long-time dynamics of weakly stable multistep methods. 

Theorem 2.4. Consider a linear multistep method (1.1), and assume that £ = 1 is 
a single root of p(() = 0. Then there exists a unique formal expansion 

$h{y) = V + hd 1 (y) + h 2 d 2 (y) + ... (2.4) 

such that 

k k 

j=0 j=0 

where identity is understood in the sense of formal power series in h. 

If the multistep method is of order r, then also the underlying one-step method 
is of order r, i.e ., $h(y) ~ Th{y) = 0(h r+1 ). 

The formal series for $h(y) is called “step-transition operator” in the Chinese 
literature (see e.g., Feng (1995), page 274). We call it “underlying one-step method”. 
Notice that this theorem does not require any stability assumption. 

Proof Expanding <&* h (y) and f (T> J h (y)) into powers of h, a comparison of the co¬ 
efficients yields 

p'{l)di(y) = cr(l )f(y) 

p'(l)d 2 (y) = -P-^l d ' 1 (y)d 1 (y) + a'(l) f(y)d 1 (y) (2.5) 

p'(l)dj(y) = ... , 

where the three dots represent known functions depending on derivatives of f(y) 
and on di(y) with i < j. Since p'( 1) 7 ^ 0 by assumption, unique coefficient func¬ 
tions dj (y) are obtained recursively. The statement on the order follows from the 
fact that the exact flow (fh{y) has a defect 0(h r+1 ) in the multistep formula. □ 

The computation of the previous proof shows that the series (2.4) is a B-series. 
This follows rigorously from the results of Sect. III. 1.4. Whereas the B-series rep¬ 
resentation of Runge-Kutta methods converges for sufficiently small h , this is in 
general not the case for (2.4); see the next example. 
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Example 2.5. Consider a consistent two-step method 

<^2?/n+2 + OLlUn+l + <^0 Un = ^(/fe/n+2 + Plfn+l + A) /n), 

and apply it to the simple system y = /(f), i = 1. The ^/-component of the under¬ 
lying one-step method then takes the form 


< &h(to,yo) = Vo + 57 a i / (J ^(to). (2-6) 

i>i 


Putting /(f) = e t yields 


••ko = 5>c / 1 

i>i 


p2£ 2( * + Al A + /3o 

(1 + e^) + rti 


for the generating function of the coefficients ci j . Since this function has finite poles, 
the radius of convergence of A(Q is finite. Therefore, the radius of convergence 
of the series (2.6) has to be zero as soon as behaves like j\ /i k? (this is 

typically the case for analytic functions). Independent of the fact whether the method 
is strictly stable or not, the series (2.6) usually does not converge. 


Both, Theorem 2.1 and Theorem 2.4, extend in a straightforward manner to 
partitioned multistep methods (1.14). To get analogous results for multistep methods 
(1.8) for second order differential equations, one has to introduce an approximation 
for the velocity v = y. This will be explained in more detail in Sect. XV.3 below. 


XV.3 Backward Error Analysis 

The backward error analysis for multistep methods (Hairer 1999) is presented in 
two steps: 

• for “smooth” numerical solutions (obtained by the underlying one-step method); 

• for the general case. 

The idealized situation of no parasitic terms gives already much insight into conser¬ 
vation properties of the method (see Sect. XV.4). The study of the general case is, 
however, necessary for getting estimates for the parasitic solutions (Sect. XV.5), so 
that rigorous statements on the long-time behaviour are possible. 

XV.3.1 Modified Equation for Smooth Numerical Solutions 

The formal backward error analysis of Chap. IX could be directly applied to the un¬ 
derlying one-step method of Sect. XV.2.2. However, due to the non-convergence of 
the series for ^(y), difficulties may arise as soon as rigorous estimates are desired. 
We prefer to derive the modified differential equation directly from the multistep 
formula and thus avoid the use of the underlying one-step method. 
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Theorem 3.1. Consider a linear multistep method (1.1), and assume that p( 1) = 0 
and p'( 1) = cr(l) ^ 0. Then there exist unique h-independentfunctions fj(y) such 
that, for every truncation index N, every solution of 

V = f(y ) + hf 2 {y) + h 2 f 3 (y) + ... + h N ~ 1 f N (y ) (3.1) 

satisfies 

k k 

y2ajy(t + jh) = hy2/3jf(y(t + jh)) + 0(h N+1 ). (3.2) 

j =0 3=0 

If the multistep method is of order r, then fj(y) = 0 for 2 < j < r. If the method is 
symmetric, then fj(y) = 0 for all even j, so that the modified equation (3.1) has an 
expansion in even powers of h. 

Proof. Using the Lie derivative ( Dig){y ) = g'(y)fi(y) (with f^y) = f(y)) and 
D = Di + hD 2 + h 2 Ds + ..., the solution of (3.1) with initial value y(t) = y 
satisfies y(t + jh) = el hD y + 0(h N+1 ) and f(y(t + jh)) = el hD f(y) + 0(h N+1 ) 
(by Taylor expansion). We thus have 

p(e hD )y = ha(e hD )f(y ) + 0(h N+1 ). (3.3) 

With the expansion xa(e x ) / p(e x ) = 1 + p\x + + • • • this becomes 

V = (1 + Mi hD + y 2 h 2 D 2 + .. .)f(y) + 0(h N ). (3.4) 

A comparison with (3.1) yields fi(y) = f(y ), and 

fj (y) = E» E ■ ■ ■ D :n /) (v) (3-5) 

l> 1 jl+...+jl=j~l 

for j > 2 , which uniquely defines the functions fj(y) in a recursive manner. □ 
Lemma 3.2. If f(y) is analytic and bounded by M in B^(yf), then we have 

\\fj{y)\\ < f° r y^-voW < R/2, (3.6) 

where g and p depend only on the coefficients aj, (3j of the multistep method. 

Proof The estimate (3.6) is obtained as in the proof of Theorem IX.7.5. We just 
sketch the main idea in the notation used there. With 5 = R/(2(J — 1)) we have 
\\fj\\j < Sbj, where the generating function b(Q = •> 1 bffii of the bj satisfies 

6(0 = ^(l + EiMKO 1 ). 

l> 1 

By the implicit function theorem, b(Q is analytic and bounded in a disc of radius 
c5/M centred at the origin (c is a positive constant depending only on the coef¬ 
ficients of the multistep method). The estimate (3.6) then follows from Cauchy’s 
inequalities as in the proof of Theorem IX.7.5. □ 
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It is remarkable that, although the Taylor series of the underlying one-step 
method generally diverges, the coefficient functions of the modified differential 
equation satisfy the same estimate as for Runge-Kutta methods. This enables us 
to prove an analogue of Theorem IX.7.6 which, for one-step methods, is the main 
ingredient for exponentially small error estimates. One can prove that for suitably 
chosen N = N(h) and for h < ho/A with ho = R/{er\M ), the solution of (3.1) 
satisfies 


k k 

X + i h ) ~ h Y, Pjfiytt + i h )) 


j =o 


3=0 


< h'yMe~ ho/h , 


where 7 depends only on the multistep formula. The proof of this statement is sim¬ 
ilar to that of Theorem IX.7.6. We skip details and refer to Hairer (1999). 

For strictly stable multistep methods, Theorem 2.1 together with the Invariant 
Manifold Theorem XII.3.2 thus imply that the underlying one-step method is expo¬ 
nentially close to the exact solution of the truncated modified equation. The parasitic 
solution terms are rapidly damped out by the property (2.3) of asymptotic phase. The 
same conclusions as for one-step methods can therefore be drawn. 

For symmetric methods the situation is not so simple. One has to study the par¬ 
asitic solution components to get information on the long-time behaviour of the 
numerical solution of (1.1). The basic techniques will be prepared in Sect. XV.3.2. 


Partitioned Multistep Methods. The extension of the modified differential equa¬ 
tion to methods (1.14) is straightforward. There exist functions fj(y , v) and gj(y , v) 
such that the exact solution of 

y = f(y, v) + hf 2 (y ,«) + ••• + h N ~ 1 f N (y, v) ^ 

v = g(y,v) + hg 2 (y,v) + ... + h N ~ 1 g N (y,v) 

satisfies the multistep formula (1.14) up to a defect of size 0{h N+1 ). The coefficient 
functions can be computed by comparing (3.7) to 

V = (! + y[ A) hD + y [A) h 2 D 2 + ... )f(y , v) + 0(h N ) 
v = (1 + y^hD + y^h 2 D 2 + .. .)g(y,v) + 0(h N ), 

where the real numbers and y^ are given by xa^ A \e x ) / p^ A \e x ) = 1 + 

y[ A ^x + y^x 2 + ... and by xa^ B \e x ) / p^ B \e x ) = 1 + y[ B ^x + y^x 2 + ..., 

respectively. The Lie operator is defined by D = D\ + hD 2 + h 2 Ds -F ..., where 
(. Dj\P)(y, v) = \P y (y , v)fj(y , v) + ^( 2 /, v)gj(y , v), and it corresponds to the time 
derivative of solutions of (3.7). 


Multistep Methods for Second Order Differential Equations. The method (1.8) 
for differential equations y = f(y) can be treated in a similar way. In the absence 
of derivative approximations we get a modified differential equation of the second 
order 



XV.3 Backward Error Analysis 579 


y = f{y) + hf 2 {y,y) + --- + h N 1 fN{y,y), (3.9) 

where the perturbation terms also depend on y. Its exact solution satisfies the multi- 
step relation (1.8) up to a defect of size 0(h N+2 ), if (3.9) is equivalent to 

V = (1 + Mi hD + ti 2 h 2 D 2 + ... )f(y) + 0(h N ), (3.10) 

where x 2 a(e x ) / p(e x ) = 1 + pix + y 2 x 2 + ..., and the time derivative is 
given by the Lie operator D = D\ + hD 2 + h 2 D 3 + ... with (Di\P)(y,y) = 

&y(y,y)y + &y(y,y)f(y) and = &y(y,y)fj(y,y) for j > 2 . a com¬ 

parison of equal powers of h in (3.9) and (3.10) uniquely defines the coefficient 
functions fj(y,y). 

If the multistep method (1.8) is complemented with a difference formula for 
approximations of the derivative v = y at grid points, 


Vn 


j=-l 


(3.11) 


we get an additional modified differential equation 

v = (1 + v\hD + is 2 h 2 D 2 + .. .)y. (3.12) 

The coefficients Vj are given by x _ 1 ^(e^) = 1 + v\x + v 2 x 2 + ..., where 
5(0 = T,j=-i Sjtf. For given y, this relation gives a formal one-to-one correspon¬ 
dence between v and y. Consequently, the differential equation (3.10) combined 
with (3.12) can be considered as a first order differential system for the variables y 
and v. 


XV.3.2 Parasitic Modified Equations 

In practice, due to the necessity of starting approximations y\, ..., yk-i, the numer¬ 
ical solution of a multistep method does not lie on a solution of (3.1). For methods, 
where initial perturbations are not damped out sufficiently fast (cf. property (2.3) 
of asymptotic phase), an additional investigation is therefore needed for the study 
of the propagation of perturbations in the starting approximations. Let us start with 
two illustrating numerical experiments. 

Example 3.3. Consider the explicit, linear 3-step method 

2M+3 — yn +2 + Vn+1 ~ yn = M/n+2 + /n+l)j (3.13) 

with characteristic polynomial p(Q = (£ — 1 ) (£ 2 + 1 ), and apply it to the pendulum 
equation (1.1.13). For a better illustration of the propagation of errors we consider 
starting approximations yi , y 2 that are rather far from the exact solution passing 
through 2 /o- The result is shown in Fig. 3.1. We observe that the numerical solution 
does not lie on one smooth curve, but on four curves, and every fourth solution 
approximation is on the same curve. 
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Fig. 3.1. Numerical solution of (3.13) applied to the pendulum equation. The initial approx¬ 
imations yo — (1.9,0.4), yi — (1.7, 0.2), y 2 = (2.1,0) are indicated by black bullets; the 
solution points y 3 , y 7 , yu ,... in grey 



Fig. 3.2. Numerical solution of the explicit midpoint rule (3.14) applied to the pendulum 
equation. The initial approximations yo = (1.9, 0.4), yi — (1.7, 0.2) are indicated by black 
bullets; the solution points y 2 , 2 / 4 ,2/6, • • • in grey 


This example shows an unexpected good long-time behaviour. Although the 
starting approximations are far from a smooth solution, the distance of the numeri¬ 
cal approximations to a smooth solution curve does not increase. This is, however, 
not the typical situation as can be seen from our next experiment. 

Example 3.4. We consider the explicit midpoint rule 

Un -\-2 Un = fn+h (3.14) 

which has p(() = (£ — 1)(£ + 1) as characteristic polynomial. This time, the nu¬ 
merical solution (see Fig. 3.2) lies on two smooth curves. In contrast to the previous 
example, an unacceptable linear growth of the perturbations can be observed. 

To be able to explain this behaviour of the multistep solutions, we complement 
the analysis of the modified equation for smooth numerical solutions with so-called 
parasitic modified equations. This theory has been developed by Hairer (1999) for 
first order differential equations, and extended to second order systems by Hairer & 
Lubich (2004). 
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Consider a stable, symmetric multistep method (1.1) and denote the zeros of its 
characteristic polynomial p(Q by = 1 (principal root) and ( 2 , • • •, (k (parasitic 
roots). We then enumerate the set of all finite products, 

{Ce} eel = {c = cr•••••cr; m J >0} = ( 3 - l5 > 

It is {l,i, — i, — 1} for method (3.13) and {1,-1} for the explicit midpoint rule 

(3.14) . The set of subscripts X can be finite or infinite. We let X* = X \ { 1 }, and 
we denote by X^ and Xn the finite subsets of elements which, in the representation 

(3.15) , have Yhj m j < N. 

Motivated by the previous examples and by representations of the asymptotic 
expansion of the global error of weakly stable multistep methods (see for example 
Sect. III.9 of Hairer, Nprsett & Wanner, 1993), we aim at writing the general solution 
y n of the multistep method ( 1 . 1 ) in the form 

Vn = y{nh) + ^2 Qze(nh), (3.16) 

tei- 

where y(t ) and zi(t) are smooth functions (with derivatives bounded independently 
of h). The following result extends Theorem 3.1. 

Theorem 3.5. Consider a stable, consistent, and symmetric multistep method (1.1). 
For every truncation index N > 2, there then exist h-independent functions 
fej(y, z*) with z* = (^)J _ 2 such that for every solution of 

y = fi,i(,y,z*) +hf lt2 (y,z*) + ... + h N ~ 1 f 1:N (y,z*) 
k = fe,i(y,z*) + hfe t2 (y,z*) + ... + h N ~ l f t}N (y,z*) for 2 < £ < k 
Zt = hfe, 2 (y,z*) + ■■■ + h N fe, N +i(y,z*) for l > k (3.17) 

ze = 0 for £ 1 N 

with initial values 2 ^( 0 ) = 0(h) for 2 < £ < k, the function 

x{t) = y(t) + C /h ze(t) (3.18) 

tei * 


satisfies 

k k 

y2a jX (t + jh) = httofixit+jh)) + 0(h N+1 ). (3.19) 

3 =0 3=0 

For z* = 0 the differential equation for y is the same as that of Theorem 3.1. The 
solutions of (3.17) satisfy zi(t) = fj(f) whenever Q = Q and this relation holds 
for the initial values. Moreover, zi(t) = 0(/i m+1 ) on bounded time intervals if Q 
is a product of no fewer than m > 2 roots of p((). 

Proof. We let zi(t) := y(t) and insert the finite sum (3.18) into (3.19). This yields 
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x(t + jh) = y ^2a j ^ j Ci +3h)/h e° hDz t{t) 

3= 0 j=0 tel 

= ^otjCyb 0z e(t) = ^2cl /h p(Cee hD )zi(t), 

lei j= o £ei 

where, as usual, D represents differentiation with respect to time. We then expand 
/(#(£)) into a Taylor series around y(t), 

/(*(*)) = X ~\ f im) (v(t ))( X (*)»-■-» X *<*.(*)) 

m>0 ' -GeZ* 

= H / (ra) (*/W)(^iWr-,^W)- 

£ex m>0 ' 0i—0 w =0 

This gives, as above, 

k 

^2/3jf(x(t+jh)) (3.20) 

j =o 

= Y J C\ ,h *{Qe hD ) ^ T X 

tez m>0 ' 0i---0 m =0 

Comparing coefficients of for i G Xn in (3.19) thus yields 

p(Qe hD )z e = hcr(Qe hD )'^2 ^ X / (m) G/)(^i> • • • > ^ m ) (3-21) 

™>o * 0i-”0 m =0 

(for ^ = 1 and m = 0 the sum is understood to include the term /(?/)). With the 
expansion x<j(Qe x )/p(Qe x ) = fi ^o + /i£,ix + P£, 2 % 2 + ... for 1 < i < k, where 
Q is a simple root of p(C), this equation becomes 

h = (jj.e,0 + M,ihD + ...) T yy f^ m \y)(ze 1 ,...,ze m ), (3.22) 

m>0 ' Oi 

and with <j(Qe x ) / p(Qe x ) = /i^o+/Z£,ix+/Z£, 2 X 2 + . • • fort? > k, where p(0) 7^ 0, 

^ = /i^ )0 +w,i/i£>+...) yy t yy f ( - m \y)(z ei ,...,z em ). (3.23) 

m> 0 

In the usual way (elimination of the first and higher derivatives by the differential 
equations and by the differentiated third relation of (3.17)) this allows us to define 
recursively the functions z*). 

From this construction process it follows that on bounded time intervals we have 
Zi(t) = 0{h) for all i > 2, and ze(t) = 0(h m+1 ) if Q is a product of no fewer 
than m > 2 roots of p((). In (3.20) and in the above construction of the coefficient 
functions fyj (y, z*) we have neglected terms that contain at least N factors Zj. This 
gives rise to the 0(h N+1 ) term in (3.19). □ 
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Initial values 2/(0), 2^(0), l = 2,..., k, for the system (3.17) are obtained from 
the starting approximations 2 / 0 ? - - - ? 2/fe —1 via the relation 

Vj = y{jh) + X C j = 0,1,..., k - 1. (3.24) 

lex* 

For /i #' 0 this represents a linear Vandermonde system for 2/(0), 2^(0). The Im¬ 
plicit Function Theorem thus proves the local existence of a solution of (3.24) for 
sufficiently small step sizes h. If yj,j — 2 approximate a solution y ex (t) of 
y = f(y) with an error 0(h s ) (with s < r + 1, where r is the order of the method), 
then 2 /( 0 ) — y e x(S*) — 0(h s ) and z#( 0) = 0(h s ) for £ = 2,..., k. 

The representation (3.16) of the numerical solution and the (principal and para¬ 
sitic) modified equations (3.17) will be the main ingredients for the study of long¬ 
term stability of multistep methods in Sect. XV.5. An extension of the previous the¬ 
orem to partitioned multistep methods is more or less straightforward. We leave the 
details as an exercise for the reader. 

Multistep Methods for Second Order Differential Equations. A completely 
analogous result can be proved for stable, symmetric multistep methods (1.8) ap¬ 
plied to y = f(y). We again denote the zeros of p(() by (j = 1 and Q, £ = 2,..., q. 
Notice, however, that £i 1 is always a double zero, and the others can be simple 
or double zeros of p(£), and that q < k. We consider the index sets X, X*, X/v, and 
X^ as in (3.15). 

Theorem 3.6. Consider a stable, consistent, and symmetric multistep method (1.8). 
For every truncation index N > 2, there then exist h-independent functions 
(where z* denotes the vector collecting as elements z^z^ if Q is a 
double root, and zg if Q is a simple root of p(()) such that for every solution of 

y = fi,i(y,y,z*) + hf lt2 (y,y,z*) + ... + h N ~ 1 f 1)N (y,y,z*) (3.25) 

Zi = fe,i{y,y,z*) + ... + h N ~ 1 fe )N (y,y, z*) if p(Q) = p'(Q) = 0 
Zi = hfe t2 (y,y,z*) +... + h N fe tN+ i(y,y,z*) if p(Q) = 0, p'(Q) ^ 0 
ze = h 2 fe t3 (y,y,z*) + ... + h N+1 f e ,N+2(y,y,z*) if p((i) ± 0 
zi = 0 for if, I N 

with initial values zg( 0) = 0(h) for 2 < t < q, the function 

x(t) = y(t) + (i /h Zi(t) 

satisfies iel 

k k 

^a jX (t + jh) = h 2 (3jf(x(t + jh)) + 0(h N+2 ) 
j =o j =o 

For z* = 0 the differential equation for y is the same as in (3.9). The solutions of 
(3.25) satisfy zg(t) = zj(t) whenever Q = Q and this relation holds for the initial 
values. Moreover, zg(t) = (9(/i m+2 ) on bounded time intervals ifQ is a product of 
no fewer than m > 2 roots of p((). 


(3.26) 

(3.27) 
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Proof. In complete analogy to the proof of Theorem 3.5 we obtain 

p(Qe hD )z e = h 2 a(Qe hD )^2 X / (m) (i/) >•••■.. *0 (3-28) 

m >0 ’ &i--0 m =0 

which differs from (3.21) only in the factor h 2 . Depending on whether Q is 
a double, a simple, or not a zero of p((), we expand x 2 cj(Qe x ) / p(Qe x ) or 
xcr(Qe x ) / p(Qe x ) or cr(Qe x ) / p(Qe x ) into a series of powers of x, and we denote 
its coefficients by pij. This then yields 

= (a^,o + Pt,ihD + .. ^ ^ f^ m \y)(zi u • • • >^ m ), (3.29) 

™>0 

if p(0) = p'(0) = 0, but p"(Ci) 0 (in particular for t = 1 and £i = 1), 

% ;= h(jj, e>0 + (j, e>1 hD + .. .'j X 2y XI (3.30) 

™>0 ’Oi--C£ m =0 

i f p(0) = 0, but p'(Cf) ^ 0, and 

Zj = h 2 (fiifl + ne,ihD + • • •) X X / (m) (2/)(^i> • • • >*0> (3.31) 

™>o ’ Oi •••O m =0 

if p(0) 7^ 0. The rest of the proof is identical to that of Theorem 3.5. □ 

For the system of modified equations (3.25) we need initial values t/(0),t/(0), 
2^(0), i^(0) if 0 is a double root of p(C), and 2^(0) if Q is a simple root. These 
initial values can be obtained from the starting approximations yo, ..., yk-i via the 
relation (3.24). 

Lemma 3.7. Consider a stable, symmetric multistep method (1.8) of order r, and 
let the starting approximations yo, ..., yk-i satisfy yj — y ex {jh ) = 0(h s ) with 
2 < s < r + 2. (locally) unique initial values for the system (3.25) 

such that its solution exactly satisfies (3.24). 

These initial values satisfy zt( 0) = zj( 0) ifQ = Q, and 

v( 0) - Vex (0) = Q(h s ), ftjf(o) - hy ex (0) = 0(/i s ), 

2^(0) = 0(h s ), hzi( 0) = 0(h s ) is a double root, (3.32) 

2 ^( 0 ) = 0(h s ), if Q is a simple root. 

Proof. We scale the derivatives by h, and consider 7/(0), hy( 0), 2^(0) and /i2^(0) as 
unknowns in the system (3.24), where y(t) and zg(t) are a solution of (3.25). For 
h = 0 a linear, confluent Vandermonde system is obtained. Since this is an invertible 
matrix, the Implicit Function Theorem proves the statement. □ 
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XV.4 Can Multistep Methods be Symplectic? 

Readers might be astonished to find a question mark in the title. The reason is that 
we shall present two definitions of symplecticity of multistep methods applied to a 
Hamiltonian system 


P=~H q (p,q), q = H p (p,q). (4.1) 

One works in the phase space of the exact flow, the other in a higher dimensional 
space. But which one is suitable? We further show that certain multistep methods 
can preserve energy over long times, even if they are not symplectic. 


XV.4.1 Non-Symplecticity of the Underlying One-Step Method 

A conjecture due to Feng Kang. (Y.-F. Tang 1993) 

A natural definition of symplecticity consists of the requirement that the underlying 
one-step method (Theorem 2.4) be symplectic. This means that the (truncated) mod¬ 
ified equation (3.1) is Hamiltonian. Unfortunately, we have the following negative 
result. 

Theorem 4.1 (Tang 1993). The underlying one-step method of a consistent linear 
multistep method (1.1) cannot be symplectic. 

Proof. We show that the first perturbation term in the modified equation (3.1) is in 
general not Hamiltonian. From (3.4) we know that f r+ \(y) = p, r (D\f)(y) which 
(omitting the non-zero error constant p r ) is given by 

Y a ( r ) F ( r )(y) = M ! Y Ur 6 (r)F(r)(y) (4.2) 

rGT,|r|=r+l rGT,|r|=r+l 

with b(r) = 1/y(t) for |t| = r + 1 (Theorem III. 1.3 and (III. 1.27)). Suppose 
now that (4.2) is Hamiltonian for all separable Hamiltonian vector fields f(y) = 
J -1 VH(y). Theorem IX.10.4 then implies 

b(u o v) + b(v o u) = 0 for all u,v E T with \u\ + |u| = r + 1 . 

This, however, is in contradiction with 

1 1 _ 1 1 

7 (u o v) y(v o u) 7 (u) y(v) ’ 

which is a consequence of Theorem VI.7 .6 (because the exact solution is a symplec¬ 
tic transformation and, as a B-series, has coefficients a(r) = 1 / 7 ( 7 -)). □ 

A similar negative result holds for a much larger class of integration methods. 
For example, it is proved by Hairer & Leone (1998) that, among the class of one- 
leg methods (see (4.7) below), only the implicit mid-point rule is symplectic (in the 
sense that the underlying one-step method is symplectic). 
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Partitioned Linear Multistep Methods. We know at least one symplectic method 
of the form (1.14). It is the symplectic Euler method (VI.3.1), which combines the 
implicit and the explicit Euler methods. However, we do not have better within the 
class of partitioned multistep methods as is shown in the next theorem. 

Theorem 4.2. If the underlying one-step method of a consistent, partitioned linear 
multistep method (1.14) is symplectic for all separable Hamiltonian systems, then 
its order satisfies r < 1 . 

Proof Suppose that the order of the method is r > 2. By (3.8), the dominant per¬ 
turbation term in the modified differential equation is p[ A ^ h r (D\f)(y, z) for the 
^-component and p^ h r (D[g)(y, z) for the z-component (at least one of the co¬ 
efficients pr A ^ and pr B ^ is non-zero). This is a P-series with coefficients b(r) = 
Pr^ h(r) if r G TP p , |t| = r + 1 and b(r) = p[ B ^ /y(r) if r G TP q , |r| = r + 1. 
If the underlying one-step method is symplectic (i.e., the modified differential equa¬ 
tion is locally Hamiltonian), Theorem IX. 10.4 implies that 

b(u o v) + b(v o u) = 0 for u G TP p , v G TP q , \u\ + |u| = r + 1. (4.3) 


Taking for u G TP p the tree with one vertex, and for v G TP q an arbitrary tree with 
| v | = r, condition (4.3) gives the first relation of 


(A) 

Mr 


(B) 

Pr r 


(r + 1 ) 7 ( 1 ;) (r + 1 ) 7 ( 1 ;) 


= 0 , 


(B) 

Mr 


(A) 

Mr r 


(r + 1 ) 7 ( 1 ;) (r + 1 ) 7 ( 1 ;) 


= 0 . 


Exchanging the colour of the vertices gives the second relation. This contradicts our 
assumption r > 2 . □ 


If we restrict our considerations to Hamiltonian systems with 

H(p, q) = ^p T Cp + c T p + U(q), (4.4) 

where the kinetic energy is at most quadratic in p , we can find symplectic, parti¬ 
tioned multistep methods of order two. Indeed, the combination of the trapezoidal 
rule with the explicit midpoint rule 

Pn -\-1 Pn = ^ (Qn- 1-1) P V(7(^n)^ •> Qn+1 Qn—1 = 2/i {Cpn H - &) (4.5) 

has the Stormer-Verlet method as underlying one-step method. This is seen as fol¬ 
lows: since the Hamiltonian is separable, formula (VI.3.4) yields the first formula 
of (4.5). The second relation is a consequence of q n +i — q n + h(Cp n + 1/ 2 + c) and 
Pn+ 1/2 + Pn- 1/2 = 2p n , and uses the linearity of H p (p, q). 

Also for this special class of Hamiltonian systems we cannot achieve high order 
and symplecticity at the same time. 

Theorem 4.3. If the underlying one-step method of a consistent, partitioned linear 
multistep method (1.14) is symplectic for all Hamiltonian systems with Hamiltonian 
of the form (4.4), then its order satisfies r < 2. 
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Proof. The beginning is the same as that for Theorem 4.2. We let r > 2 be the order 
of the method (A) so that y f" 4 ) ^ 0. Instead of (4.3) we now have to use 

b{u oo v) — b(v oo u) = 0 for u,v G TN p , \u\-\-\v\ = r, (4.6) 

which also follows from Theorem IX. 10.4. Taking for u G TN p the tree with one 
vertex, and for v G TN P an arbitrary tree with |u| = r — 1, condition (4.6) gives the 
relation 

- 1 ) _ Mr A) = 0 

2 (r + 1)7(7;) r(r + 1)7(7;) 

which is contradictory for r > 2 , because 7 ^ 0 . □ 

Remark 4.4. We believe that the statement of Theorem 4.3 remains true, if we 
restrict our consideration to Hamiltonian functions (4.4) with c = 0 and invertible 
matrix C. Since multistep methods ( 1 . 8 ) for second order differential equations can 
be converted into partitioned multistep methods, this then implies that methods ( 1 . 8 ) 
cannot be symplectic unless the order satisfies r < 2 . 

XV.4.2 Symplecticity in the Higher-Dimensional Phase Space 

We present here a second approach for the definition of symplecticity of multistep 
methods (more precisely, of one-leg methods). It is much inspired by the G-stability 
theory of Dahlquist (1975) for the study of stiff differential equations. 

To simplify the nonlinear stability theory of linear multistep methods (1.1), 
Dahlquist (1975) introduced the so-called one-leg methods , which are defined by 
the relation 

k k 

a jUn+j = hf f ftjUn+j') , (4.7) 

j =0 j =0 

where the normalization cr(l) = JA (3j = 1 is assumed. In fact, there is a close 
relationship between the numerical solution of (4.7) and (1.1), and their long-time 
behaviour is the same (see Sect. V .6 of Hairer & Wanner, 1996). In the following 
we consider the super-vectors Y n = (y n+ k- 1 5 • • • ,?m) T collecting k consecutive 
approximations of the solution. 

Definition 4.5. Let G be an invertible symmetric matrix of dimension k. A k- step 
multistep or one-leg method is called G-symplectic if 

77 (G ® S) Y n+l = y n T (G ® S) Y n , (4.8) 

whenever the differential equation y = f(y) has y T Sy as invariant (with symmetric 
S), i.e., the vector field satisfies y T Sf(y ) = 0 for all y. 

It is of course also possible to express this definition in terms of differential 
forms. As a consequence of Lemma VI.4.1 the conservation of quadratic first inte¬ 
grals is equivalent to symplecticity (Bochev & Scovel 1994). 

In contrast to the negative results of Sect. XV.4.1, there exist a lot of G- 
symplectic methods. We have the following result. 
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Theorem 4.6 (Eirola & Sanz-Serna 1992). Every irreducible symmetric one-leg 
method (4.7) is G-symplectic for some matrix G. 

Proof. We recall that a one-leg method is irreducible if the generating polynomials 
p(C) and cr(Q have no common zeros. 

Construction of G. The symmetry relation (1.3) implies p( 1/C) = —(~ k p{() 
and cr( 1/C) = (~ k cr((). Consequently, the polynomial p(Qa(uj) + p(uj)a(Q van¬ 
ishes for a; = 1 /C, and contains the factor (uj — 1 . We then define G by 

k 

+ pM<t(C)) = (Cw-i) X (4.9) 

1 

The matrix G obtained in this way is symmetric. 

Regularity of G. Applying the geometric series we get 

k 

£ 9ijC 1 = _ 2 +p( u; ) (7 (c)) (i + & + c 2 ^ 2 + • • •)» 

*,.7 = 1 

where the identity holds as formal power series. Suppose that the matrix G is not 
invertible. Then there exists a vector u = (uq,u\, ... ,Uk- i) T such that Gu = 
0. We formally replace the appearances of with uj-% for j < k and with 
zero for j > k. This gives an identity of the form 0 = p(C)a(C) + ^(OKC) with 
polynomials a(C) and 6 (C) of degree at most k — 1 , and we get a contradiction with 
the irreducibility of the method. 

G-Symplecticity. We next replace in (4.9) C^ with y^ +i Sy n+ j. Together with 
(4.7) this yields 

k rp k 

fc(5>y»+i) s/(EAi/n+i) = X l +i(G®5)y n+1 - y n T (G®s)y„, 


i=0 $=0 

where F n = (y n ,..., This proves (4.8) for all functions f(y) satisfying 

y T Sf(y) =0. □ 


Example 4.7. We consider the explicit midpoint rule (1.6), which is also a one-leg 
method, and the 3-step method (3.13). By Theorem 4.6 the one-leg versions are 
G-symplectic. Following the constructive proof of this theorem we find 

<?=(;;) and 0 -(j f ;)■ 

respectively. We apply both methods to two closely related Hamiltonian systems, 
namely the pendulum equation with H(p,q) = p 2 /2 — cos q and a perturbed prob¬ 
lem with H(p,q) = p 2 /2 — cosg(l — p/6), and we study the preservation of the 
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Fig. 4.1. Numerical Hamiltonian H(p n , q n ) — H(jpo, qo) of the explicit mid-point rule and the 
3-step method (3.13), applied with step size h = 0.01 to the pendulum problem (H(p, q ) = 
p 2 /2 — cos q) and to a perturbed problem (H (p, q ) = p 2 /2 — cos q{l — p/6)) on the interval 
[0,1100] (only every 131st step is drawn) 


Hamiltonian (see Fig. 4.1). The result is somewhat surprising. The midpoint rule be¬ 
haves well for the perturbed problem, but shows a linear error growth in the Hamil¬ 
tonian for the pendulum problem. On the other side, the weakly stable 3-step method 
behaves well for the pendulum equation (which is in agreement with the stable be¬ 
haviour of Fig. 3.1), but has an exponential error growth for the perturbed problem. 
Notice that different scales are used in the four pictures. 

The above example illustrates that G-symplecticity of a numerical method is 
not sufficient for a good long-time behaviour. It is necessary to get under control the 
parasitic solution components. 


XV.4.3 Modified Hamiltonian of Multistep Methods 

After the negative results of Sect. XV.4.1, we are fortunately also able to prove pos¬ 
itive results concerning the near conservation of the Hamiltonian. 

Theorem 4.8. For a symmetric, consistent linear multistep method (LI) of order r 
applied to y = J -1 WH(y), there exists a series of the form 

H(y') = H(y) + h H r+ i (y) + h r+2 H r+3 (y) + ... , (4.10) 

which is a formal first integral of the modified equation (3.1) without truncation. 
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Proof. With p(e x )/(xa(e x )) = 1 + y r x r + 7 r+2 x r+2 + ... it follows from (3.3) 
that the solution of the modified differential equation satisfies 

(1 + lr h r D r + lr+2 h r+2 D r+2 + .. .)y = J~ 1 VH(y) + 0(h N ), (4.11) 


where, due to the symmetry of the method, only odd derivatives of y(t) appear. We 
multiply both sides with y T J so that the right-hand side becomes the total derivative 
^ H(y ). On the left-hand side we note y T Jy = 0, y T Jy^ = y T Jy ) and 

similarly for higher derivatives 


yTjy^+l) 


j t (y T Jy {2m) -y T Jy (2m ~ 1] + ... ±y (m)T Jy (m+1) ). (4.12) 


We thus obtain a time derivative of an expression in which the appearing derivatives 
can be substituted as functions of y via the modified differential equation (3.1). 
Altogether this yields 


-j t (h r H r+1 (y) + h r+2 H r+3 (y) + ...)= ±H(y) + 0{h N ). 


which proves the statement. 


□ 


The statement of the previous theorem is somewhat surprising. The underlying 
one-step method, although not symplectic, nearly conserves the Hamiltonian for 
general H(y) (not even reversibility is required). This indicates that the condition 
(IX.9.20) can be satisfied for all trees also by non-symplectic methods. 

For partitioned multistep methods we do not know of a similar result unless if 
we restrict our consideration to Hamiltonians of the form (4.4). In this case we are 
concerned with multistep methods for second order differential equations. 


Theorem 4.9. For a symmetric , consistent linear multistep method (1.8) of order r 
applied to y = —VC/ (y), there exists a series of the form 

H(y,y) = \ y T y + U(y) + h r H r+1 (y,y) + h r+2 H r+3 (y,y) + ... , (4.13) 

which is a formal first integral of the modified equation (3.9) without truncation. 


Proof. The proof is very similar to that of the previous theorem. We expand 

p(e x )/ ( x 2 cr(e x )) = 1 + y r x r + 7 r+2 ^ r+2 + • • •, and similar to (3.10) we obtain 

(1 + lr h r D r + lr+2 h r+2 D r+2 + ...)£ = -VU(y) + 0(h N ). (4.14) 


This time we multiply both sides with y T . The right-hand side becomes the total 
derivative ^ U(y ), and for the left-hand side we use y T y = ^( y T y ) and for higher 
even-order derivatives 

y T / 2m) = j t (W 2 ™- 1 ) - y T y {2m ~ 2) + • ■ • ± 

Integrating and substituting second and higher derivatives of y via the modified 
differential equation (3.9) yields the desired formal first integral close to the Hamil¬ 
tonian of the system. □ 


\y {m)T y {m) ). (4.15) 
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The formal first integral (4.13) does not depend on how approximations to the 
derivative v = y are obtained. If the derivative at grid points is numerically com¬ 
puted with the formula (3.11), then one can use the one-to-one correspondence 
(3.12) to express the coefficient functions of the modified differential equation in 
terms of y and v. 

XV.4.4 Modified Quadratic First Integrals 

Symplectic one-step methods exactly preserve quadratic first integrals (Sect. IV.2). 
This is not true for the underlying one-step method of symmetric multistep methods. 
However, as we shall prove in this section, it nearly preserves such first integrals. 

Theorem 4.10. Let Q(y) = y T Cy (with a symmetric matrix C) be a first integral 
of y — f(y ). For a symmetric, consistent linear multistep method (1.1) of order r, 
there then exists a series of the form 

Q{y) = y T Cy + h r Q r+1 (y) + h r+2 Q r+3 (y ) + ... , (4.16) 

which is a formal first integral of the modified equation (3.1) without truncation. 

Proof. We multiply (4.11) with y T C und thus obtain 

y T C( 1 + 7r h r D r + lr+2 h r+2 D r+2 + ...)y = y T Cf(y ) + 0(h N ). 

Since y T Cy is a first integral, the term on the right-hand side vanishes. For the terms 
on the left-hand side we notice that y T Cy = | ^ ( y T Cy ) and that 

yTCyCm+l) = d_ (yT Cy (2m) _ ^(2™-!) + . . . ± I . ( 4 . 17) 

As in the proofs of Sect. XV.4.3 we now deduce the statement. □ 

A similar result holds for second order differential equations and methods (1.8). 
This concerns for example the total angular momentum in 7V-body systems. 

Theorem 4.11. Suppose thaty = f(y) has L(y , y) = y T Ey as first integral, i.e., E 
is skew-symmetric and y T Ef(y) = 0. For a symmetric, consistent linear multistep 
method (1.8) of order r, there then exists a series of the form 

L{y,y) = y T Ey + h r L r+1 (y,y) + h r+2 L r+ 3 (y,y) + , (4.18) 

which is a formal first integral of the modified equation (3.9) without truncation. 

Proof. Multiplying (4.14) with y T E gives 

y T E( 1 + 7 r h r D r + lr+2 h r+2 D r+2 + ...))/ = y T Ef(y) + 0(h N ). 

The term at the right vanishes. Since E is a skew-symmetric matrix, we have for the 
terms to the left that y T Ey = ^y T Ey and that 

y T Ey ( - 2m+2) = j (y T Ey i2m+1) - y T Dy {2m) + ... ± y^ m)T Ey {m+1) ). (4.19) 
This yields the statement as in the previous proofs. □ 
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Remark 4.12. Noticing that the underlying one-step method of a symmetric mul¬ 
tistep method can be expressed as a formal B-series (cf. Sect. XV.2.2), it follows 
from (4.17) that the modified first integral of Theorem 4.10 is of the form (VI.8.6). 
By Theorem VI.8.5 the underlying one-step method is therefore conjugate to a sym- 
plectic integrator. 

A similar result holds for symmetric methods (1.8) complemented with a sym¬ 
metric derivative approximation (3.11). The variables v and y are related via (3.12) 
having an expansion in even powers of h. Substituting y = y(y,v) of this relation 
into the modified first integral (4.18), we obtain an expression of the form (VI.8.11). 
Here, the elementary differentials correspond to the system y = v, v = f(y) (v has 
to be identified with z). Theorem VI.8.8 combined with Theorem 4.11 proves that 
the underlying one-step method is conjugate to a symplectic integrator. 


XV.5 Long-Term Stability 

The results of Sects. XV.4.3 and XV.4.4 imply the near conservation of the total 
energy and of the angular momentum in TV -body problems for numerical solutions 
of the underlying one-step method of multistep methods. This, however, is of no 
value as long as the parasitic solutions of the multistep method are not under control. 
The present section is devoted to the study of the stability of numerical solutions 
over long time intervals. 


XV.5.1 Role of Growth Parameters 


The analysis of this section is based on the representation 

Vn = y(nh ) + ^ Q z e (nh) (5.1) 

rex* 

of the numerical solution of a multistep method (cf. formula (3.16)). 

Linear Multistep Methods for First Order Equations. By Theorem 3.5 the par¬ 
asitic components zt (for 2 < i < k) are the solution of a differential equation 
which, by (3.22), is of the form 

k = Hif'(y)ze + ... • (5.2) 


This is just the variational equation of y = f(y) scaled by 


Mr 


ff (Cr) 
Crf'(Cr) ’ 


(5.3) 


which is the so-called growth parameter as introduced by Dahlquist (1959) and 
motivated there by a linear stability analysis (see Exercise 5). 

We shall illustrate at the examples of Sect. XV.3.2 that the study of the truncated 
equation (5.2) gives already a lot of insight into the long-time behaviour of multistep 
methods. 
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Fig. 5.1. First component of the solution of the pendulum equation (grey) together with the 
Euclidean norm of the solution v(t) of the scaled variational equation (5.4) 


Example 5.1. For the pendulum equation, the truncated equation (5.2) is 

( V2 \ ( i A= ( 0 

( - sin y\ J ’ \v 2 J M ( - cos yi 0 J \v 2 

We fix initial values as y(0) = (1.9, 0.4) T and v(0) = (0.1,0.1) T . Figure 5.1 shows 
the solution component y\(t) in grey, and the Euclidean norm of v(t) as solid black 
line, once for y = —1 and once for p = 0.5. We notice that the function v(t) 
remains small and bounded for /i =■ —0.5, and that it increases linearly for p = —1. 

This agrees perfectly with the observations of Figs. 3.1 and 3.2, because the 
method (3.13) has growth parameter fig = —0.5 for the roots Q = =b i, whereas the 
explicit midpoint rule (3.14) has fig — — 1 for (g = —1. 

The same analysis for partitioned multistep methods allows one to better un¬ 
derstand the behaviour of the different methods in Fig. 1.3. The leading term of the 
parasitic modified equations depends on whether (g is a root of both polynomials 
Pa(C) an d Pb{( ), or only of one of them. This is very similar to the situation en¬ 
countered with multistep methods for second order differential equations which we 
treat next. 

Linear Multistep Methods for Second Order Equations. Theorem 3.6 tells us 
that the modified equation for the parasitic components zg depends on the multiplic¬ 
ity of the root Q. Consider a stable, symmetric method (1.8) for y = f(y). If (g is a 
double root of p(£), formula (3.29) yields 

z e = nef'{y)ze +He = (5.5) 

where we have not written terms containing at least two factors Zj . If (g is a single 
root of p(C), we get from (3.30) that 




k = hnef(y)ze + ■■■, 


He 


<kQ) 

CeP 1 ((e) 


(5.6) 


There is an enormous difference between the parasitic modified equations corre¬ 
sponding to double or single roots of p(Q. Equation (5.5) is the complete analogue 
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of (5.2) and, as before, the long-time behaviour is hardly predictable and strongly 
depends on the growth parameter. For single roots, however, we are concerned with 
a first order differential equation (5.6) having an additional factor h as bonus. For 
the analysis of Sects. XV.5.2 and XV.5.3 it is important to have only single roots. 

Definition 5.2. A symmetric multistep method (1.8) for second order differential 
equations is called s-stable if, apart from the double root at 1, all zeros of p(Q are 
simple and of modulus one. 

The linearized parasitic modified equations give much insight into the long-time 
behaviour of multistep methods. To get rigorous estimes over long times, however, 
further considerations are necessary. A partial result is given by Cano & Sanz-Serna 
(1998) for multistep methods (1.8) applied to equations y = f(y) with periodic 
exact solution. There, the first terms of the asymptotic error expansion for the global 
error are computed, and their growth as a function of time is studied. We shall follow 
the approach of Hairer & Lubich (2004) who exploit the Hamiltonian structure of 
second order differential equations. 

XV.5.2 Hamiltonian of the Full Modified System 

In the remainder of this section we restrict our consideration to s-stable, irreducible 
linear multistep methods 

k k 

J2a jqn+ j = -^^PjVUiqn+j), (5.7) 

j =0 3=0 

applied to Hamiltonian systems written as 

q = -VU(q), (5.8) 

where V ( q ) is assumed to be real-analytic in the considered region. 

The key to proving long-time error estimates is the observation that much of the 
Hamiltonian structure is conserved in the modified equations (3.25). The results and 
techniques of this subsection are closely related to those of Sect. XIII.6.3 developed 
for numerical methods for oscillatory differential equations. 

We let z = (zi)i e x N and define U(z) as 

U(z) = U(z 0 )+y / —^ V uW(z 0 )(zt 1 ,...,zeJ, (5-9) 

Z —' mi z ' 

m> 1 Ci ! — 1 

where the second sum is over all indices i\ G 2^,..., im G with = 1 

(using the notation of Sect. XV.3.2). Since the roots of p((), different from (1 = 1 
are complex and appear in pairs (Exercise 3), also the functions zg appear in pairs. 
It is convenient to use the notation Z-g = Zj if (i = ( j . 

It follows from (3.28) with f(q ) = — VC/ (q) that every solution of the truncated 
modified equation (3.25) satisfies 
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p(Qe hD )z e = -h 2 a(Qe hD )V z _ e U( z) + 0(h N+2 ) (5.10) 

(for all £ G X) as long as 

ytK, \\y\\<M, \\ze\\ < S for 1 < £ < k, (5.11) 

where K is a compact subset of the domain of analyticity of U(q), M > 0 some 
bound on the derivative, and 0 < S = 0(h) is a sufficiently small constant (note 
that this implies \\zi\\ < S for all £ G T* if h is sufficiently small, cf. the algebraic 
relations of (3.25)). 

For ease of presentation, we assume for the moment that cr(Ci) / 0 for all 
£ E 2jv (we know that this holds for 1 < £ < k, because the method is irreducible). 
We apply the operator cr-*(Qe hD ) to both sides of (5.10) and divide by h 2 \ 

h~ 2 (£) (C ee hD )z e = -V z _ e U(z) + 0(h N ). (5.12) 

We then multiply with z£ e , sum over alH G T^, and thus obtain 

h -2 ^ ^ e {^j(C, e e hD )z t + ^W(z) = 0{h N ). (5.13) 

We now show that also the first expression on the left-hand side is a total derivative 
of a function depending on z and its time derivatives. For this we note that 

^—^j(Qe 2X ) = ^ c £,jX j with real coefficients qj = (—1 ) J c_^j. (5.14) 
(J j> o 

This holds because the symmetry of the multistep method yields (p/cr)( 1/C) = 
(p/a)(() and hence, for real x, (p/a)(Qe' lx ) = ( p/a)(Qe ix ) = (p/a)(Qe ix ). 
With the expansion (5.14) we obtain 

AT+l 

= X c ej (-ihyz { p + 0(h N+2 ). (5.15) 

3 = o 

To study (5.13) we apply the relation (4.12) for the real function y = zi and for 
Z£ corresponding to Q = — 1 , while for the complex-valued functions z = Z£, with 
complex conjugate z = z-£, we use 

Together with (5.15) these relations show that the terms 

z-e (^) {Cee hD )z e + zj (^) (C -ee hD )z_ e 

AT+1 

= ^ Qj 2Re^(-i/i) 3 + 0(/i jv+2 ) 

j'=o 
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give a total derivative (up to the remainder term). Hence the left-hand side of (5.13) 
can be written as the time derivative of a function which depends on ^ G In, and 
on their derivatives. Using the modified equation (3.25) we eliminate all Z£ corre¬ 
sponding to (i with p((i) 0 and their derivatives, the first and higher derivatives 

of Z£ (for 1 < t < k), and the second and higher derivatives of y = z\. We thus get 
a function 


W(y,y,z*) = H 0 (y,y,z*) + ... + h N 1 H N _ 1 (y,y,z*) (5.16) 

with z* = (z£)^~ 2, such that 

^n(y{t),y(t),z*(t)) = 0(h N ), (5.17) 

along solutions of (3.25) that stay in a set defined by (5.11). The function Ti is 
therefore an almost-invariant of the system (3.25). 

If, however, a(() does have a zero Q, then we omit the corresponding term from 
the sum in (5.13). Hence the term zf £ V z _ £ U(z) is missing from (d/dt)U{ z) and 
must therefore be compensated in the remainder term. Since Q is a product of no 
fewer than two zeros of p((), it follows from (3.31) and from o = 0 that Z£ = 
0(h 3 5 2 ), as long as \\zj\\ < 6 for 1 < j < k. We further have \7 Z _ £ U(z) = 0(S 2 ), 
so that the remainder term in (5.17) is augmented by 0(h 3 5 4 ). 

We summarize the above considerations (Hairer & Lubich 2004) as follows. 

Theorem 5.3. Every solution of the truncated modified equation (3.25) satisfies, 
with Tifrom (5.16), 

n{y(t),y(t), z *(i)) = n(y( 0), y( 0), z*(0)) + 0(th N ) + 0(th 3 S 4 ) (5.18) 

as long as the solution stays in the set defined by (5.11). Moreover, 

'H(y,y,z*) = H(y,y) + 0{h p ) + 0(hS 2 ). (5.19) 

The closeness to the Hamiltonian H(y,y) = \\\y\\ 2 + U(y) follows also di- 
rectly from the above construction. For z* = 0 we have 'H(y,y,0) = H(y,y ), 
where H is the modified energy from Theorem 4.9. 

We will use Theorem 5.3 in Section XV.6.1 to infer the long-time near-conserva¬ 
tion of the Hamiltonian along numerical solutions. Before that we need to bound the 
parasitic components. 

XV.5.3 Long-Time Bounds for Parasitic Solution Components 

The modified equations have further almost-invariants which are close to the squares 
of the norms of the parasitic components that correspond to the roots of p((). We 
derive them here and use them to show that all parasitic solution components remain 
small over very long times. The techniques used in this subsection are similar to 
those in Sects. XIII.6 and XIII.7. 
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We consider i with 1 < t < k for which Q is a simple root of p(Q and cr(^) ^ 
0. The dominant term on the left-hand side of (5.12) is -c^ih^zi. Since 

^IM| 2 = zT. e ze + zj z-e, (5.20) 

we multiply (5.12) with zf £ and the corresponding equation for with zj, and 
we form the difference, so that the dominant term on the left-hand side becomes 
-c^xih- 1 £\\Z(\\ 2 (notec.^i = -c eA ). Dividing by -c^xihr 1 gives 

~ k Or, p - K,e“)« - zf -(i-ee hD )z-A 

" ih, , (“I) 

= - (~z' I l(V z _ e U{z) + zJV Ze U(z)Y 

ce, 1 ' ' 

We first estimate the right-hand expression. Since 

V z _ e U(z) = V 2 U(z 0 )z e + 0(S 2 ), 


as long as (5.11) is satisfied, we obtain from the symmetry of the Hessian that the 
right-hand side of (5.21) is of size 0(hS 3 ). The dominant 0(hS 3 ) term is present 
only if can be written as the product of two roots of p(() other than 1. If this is 
not the case, the expression (5.21) is of size 0(hS 4 ). 

Using the expansion (5.15) on the left-hand side of (5.21) and the relations (for 
2 = *i) 


Rez T z( 2m+1 ) = Re (z T z^ 2rn ^ — z T z^ 2m ^ 

dt V 

lmz T z^ 2m+2 ^ = lm^-(z T z( 2m+ V -i T z^ 

dt V 


... =F 1 ( z ( m )) T z ( m )^ 

+ ...±{z (m) ) T z {m+ f 


we obtain that (5.21) is, up to 0(h N ), the total derivative of a function depending 
on z and its derivatives. 

By construction the dominant term is \\z£ || 2 . The following terms have at least 
one more power of h and at least one derivative which by (3.25) gives rise to an 
additional factor h. Eliminating higher derivatives with the help of (3.25), we arrive 
at a function of the form 


£e(y, y, z*) = INI 2 + h 2 K ta (y, y, z*) + ... + h N 1 Kg tN ^ 1 (y, y, z*). (5.22) 

As we have seen, its total derivative is of size 0(hS 3 ) or smaller. We summarize 
these considerations in the following theorem. 

Theorem 5.4. Along every solution of the truncated modified equation (3.25) the 
function /Q (?/,?/, z*) satisfies for 1 < t < k 

r.e(y(t),y(t),z*(t)) = K e (y(0), y(Q), z*(0)) + 0(th N ) + G(thS 3 ) (5.23) 
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as long as the solution stays in the set defined by (5.11). The second error term is 
replaced by 0(th5 4 ) if no root of p(Q other than 1 is the product of two other roots. 
Moreover, 

JC t (y,y, z*) = \\z e \\ 2 + 0(h 2 S 2 ). (5.24) 

This result allows us to write the numerical solution in a form that is suitable for 
deriving long-time error estimates. Let us first collect the necessary assumptions: 

(Al) the multistep method (5.7) is symmetric, 8-stable, and of order r; 

(A2) the potential function U ( q ) of (5.8) is defined and analytic in an open neigh¬ 
bourhood of a compact set K ; 

(A3) the starting approximations go, • • •, Qk-i are such that the initial values for 
(3.25) obtained from Lemma 3.7 satisfy y( 0) G K , ||y(0)|| < M, and 
|J^(0)|| < 5/2 for 1 < £ < k; 

(A4) the numerical solution {q n } stays for 0 < nh < T in a compact set Kq which 
has a positive distance to the boundary of K. 

Theorem 5.5 (Hairer & Lubich 2004). Assume (A1)-(A4). For sufficiently small 
h and 5 and for a fixed truncation index N (large enough such that h N = 0(5 4 )), 
there exist functions y(t) and zt(t) on an interval of length 

T = 0((hS)~ 1 ) 


such that 

• q n = y(nh ) + ^ Qzi(nh) for 0 < nh < T; 

£ex* 

• on every subinterval [ jh , (j + 1 )h) the functions y(t ), zi(t) are a solution of the 
system (3.25); 

• the functions y(t),zt(t) have jump discontinuities of size (D(h N + 2 ) at the grid 
points jh; 

• \\ze(t)\\ < 5 for 0 <t <T. 

If no root of p(Cj) other than 1 is the product of two other roots, all these estimates 
are valid on an interval of length T = 0((M 2 ) -1 ). 

Proof. To define the functions y(t),Zi(t) on the interval \jh, (j + l)h) we consider 
the k consecutive numerical solution values qj,qj+ 1 ,.. qj+k- 1 - We compute ini¬ 
tial values for (3.25) according to Lemma 3.7, and we let y(t), zn(t) be a solution of 
(3.25) on [ jh , ( j +1 )h). Because their defect is 0(h N ) and 0(h N+1 ), respectively, 
such a construction yields jump discontinuities of size 0(h N+ 2 ) at the grid points. 

It follows from Theorem 5.4 that lCe(y(t), y(t), z*(t)) remains constant up to an 
error of size 0(h 2 5 3 ) on the interval \jh, ( j + 1 )h). Taking into account the jump 
discontinuities, we find that 

r-e(y(t),y(t),z*(t)) < K.e(y(0),y(0),z*(0)) + CithS 3 + C 2 th N+1 (5.25) 
as long as ||^(£)|| < 5. By (5.24) this then implies 
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\\z t (t )\\ 2 < 11^(0)II 2 + CithS 3 + C 2 th N+1 + c 3 h 2 5 2 . (5.26) 

The assumption \\z^{t) || < S is certainly satisfied as long as C\th5 < 1/4, 
C 2 th N+1 < (5 2 /4, and C^ti 2 < 1/4, so that the right-hand side of (5.26) is 
bounded by S 2 . This proves not only the estimate for ||^(f)||, but at the same time 
it guarantees recursively that the above construction of the functions y(t), zi(t) is 
feasible. □ 

Notice that for initial values computed by a sufficiently accurate one-step 
method the constant S can be chosen as small as 0(h r+ 2 ) where r is the order 
of the multistep method (cf. Lemma 3.7). The above estimates are therefore valid 
on very long time intervals. 

Example 5.6. To illustrate the long-time behaviour of the parasitic terms zg we 
consider the pendulum equation q = — sin q , and we apply the symmetric multistep 
methods (B) and (C) of Example 1.2. For method (C), the starting values are chosen 
far from a smooth solution, so that the propagation of the parasitic terms in the 
numerical solution can be better observed. We compute the velocity approximation 
by 

V-n = Y 2 Qn — l) i.O.n+2 Qn— 2 / • (5.27) 



Fig. 5.2. Stable propagation of perturbations in the starting values for method (C) of Example 
1.2; initial values are go = 1.141 qi = 1.158, q 2 = 1.178, and = 1.206 



Fig. 5.3. Unstable propagation of perturbations in the starting values, for method (B) of Ex¬ 
ample 1.2; initial values are qo = 1.147 qi = 1.183, q\ 2 = 1.255, and #3 = 1.286 
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Figure 5.2 shows the numerical solution (q n ,v n ) for n > 2. The values for n = 
2,3,4, 5 are indicated by larger black bullets. The parasitic roots of method (C) are 
zb i and both are simple. The numerical solution is therefore of the form 

q n = y(nh) + i n z\{nh) + ( -i) n zi(nh ) # (-l) n Z2(nh). 

One observes in Fig. 5.2 that the functions zj (t) not only remain bounded and small, 
but they stay essentially constant over the considered interval. This should be com¬ 
pared to Fig. 3.1, where the parasitic functions Zj(t ) are bounded, but not constant. 

Method (B) has a double parasitic root at —1 and, therefore, is not s-stable. 
Its numerical solution behaves like q n = y(nh ) + (— 1 ) n z(nh). In Fig. 5.3 every 
second approximation is drawn in grey. One sees that the numerical solution stays 
on two smooth curves y(t) + z(t) and y(t) — z(t) which, however, do not remain 
close to each other. 


XV.6 Explanation of the Long-Time Behaviour 

The bounds on the parasitic solution components of Sect. XV.5.3 allow us to get rig¬ 
orous statements on the long-time behaviour of multistep methods (5.7) for second 
order differential equations. The following results are taken from Hairer & Lubich 
(2004). We do not know of similar results for multistep methods (1.1). 

XV.6.1 Conservation of Energy and Angular Momentum 

The energy conservation is now a direct consequence of Theorems 5.3 and 5.5. We 
shall use the representation of q n in terms of functions y(t), zi(t) as in Theorem 5.5. 
Taking into account the jump discontinuities of these functions, Theorem 5.3 yields 

n{y{t),y(t), 2 ,*{t)) = W(y(0),y(0),z*(0)) + 0{th 3 S 4 ) + 0(th N+l ). 

We have 5 = 0{h r+1 ) if the starting approximations are computed by a rth order 
one-step method. If N is chosen sufficiently large, this together with (5.19) implies 

H(y(t),y(t)) = H(y(0),m) + 0{h*) for 0 < t < T = 0(h~P~ 2 ). 

If the velocity approximation p n = v n is given by a rth order finite difference 
formula (3.11), it follows from Theorem 5.5 that p n = y(nh) + 0(h r ) provided 
the truncation index N is sufficiently large. This proves the following result, and 
explains the excellent long-time behaviour of method (C) in Fig. 1.2. 

Theorem 6.1 (Total Energy). For a problem q = —S7U(q) with total energy 
H (p, q) = \p T p + U(q), the numerical solution of an s-stable symmetric mul¬ 
tistep method (5.7) of order r satisfies 

H(q n ,Pn) = H{qo,Po) + 0(h r ) for nh<h~ r ~ 2 . 

If no root ofp(C) other than 1 is a product of two other roots, the statement holds 
on intervals of length 0(h~ 2r ~ 3 ). □ 
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We assume next that the differential equation q = —VJJ(q) has a quadratic 
first integral of the form L(q, q) = q T Aq (e.g., the angular momentum in 7V-body 
problems). This means that A is skew-symmetric and \7U(q) T Aq = 0. The last 
equation can also be interpreted as the invariance relation V ( e rA q ) = U(q). This 
property implies for U( z), given by (5.9), that U(e rA z) = U{ z) (here e rA z = 
(e rA Z£)i e x). Along solutions z (t) of the modified equations (5.10) we therefore 
have up to terms of size 0(h N ) 




T —0 


U(e rA z) = y/ <J 4V 2 _,W(z) = J2 h 2z -e A (^)(Cee hD )z e . 


lex 




If a(() has a root Q, then the corresponding term is omitted from the last sum, lead¬ 
ing to a remainder term which in the worst case is O (/z 3 <5 4 ), as in Theorem 5.3. Like 
in the previous proofs, the last sum is, for skew-symmetric A , the total derivative of 
a function 


£(y,y,v*) = L 0 (y,y,z*) + ... + h N 1 L N _ 1 (y,y,z*) 
which satisfies (under the same assumptions as in Theorem 5.3) 

£{y(t),y(t),z*(t)) = C(y(0), 7/(0), Z*(0)) + 0(th 3 S A ) + 0{th N+1 ) 

and 

C(y, y, z*) = L(y, y) + 0{h?) + 0(S 2 /h). (6.1) 

We therefore obtain the following result. 

Theorem 6.2 (Angular Momentum). Let L(q,q) = q T Aq be a first integral of 
q = —VU(q). The numerical solution of an s-stable symmetric multistep method 
(5.7) of order r then satisfies 

L{q n ,Pn) = L(qo,Po ) + 0{h r ) for nh < h~ r ~ 2 . 

If no root ofp(C) other than 1 is a product of two other roots, the statement holds 
on intervals of length 0(h~ 2r ~ 3 ). □ 

XV.6.2 Linear Error Growth for Integrable Systems 

The differential equation q = — Vf/(g), written as q = v, v = — VC/(g), is 
reversible with respect to the involution v i—► — v. Assume that it is also an integrable 
system in the sense of Definition XI. 1.1, and denote by a = I(q,v) the action 
variables, and by u(a) the frequences of the system. 

By Theorem 5.5, the numerical solution can be written as q n = y(nh) + 
X^ez* CF^( n ^)’ where (at least locally) y(t) is the solution of a modified dif¬ 
ferential equation (first equation of (3.25)) 

y = fo,o{y, y, z*) + hf 0tl (y, y,z*) + ... + h N ~ 1 fo,N-i(y, y, Z*) (6.2) 
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which, for z* =0 becomes the reversible modified differential equation (3.9). Since 
zj(t ) = 0(5) (see Theorem 5.5) and since z* appears at least quadratically in (6.2), 
this equation is a 0(5 2 ) perturbation of (3.9). We are now in the position to apply 
the results of Lemma XI.2.1 and Theorem XI.3.1. The additional (non-reversible) 
perturbation of size 0(5 2 ) in the differential equation ( 6 . 2 ) produces an error term 
of size 0(t5 2 ) in the action variables and of size 0(t 2 5 2 ) in the angle variables. If 
5 = 0(h r+1 ), these terms are negligible with respect to those already appearing in 
Theorem XI.3.1. The errors due to the jump discontinuities (Theorem 5.5) are also 
negligible. We have thus proved the following statement. 


Theorem 6.3. Consider applying the s-stable symmetric multistep method (5.7) of 
order r to an integrable reversible system q = —'VU(q) with real-analytic poten¬ 
tial U. Suppose that cj* G satisfies the diophantine condition (X.2.4). Then, 
there exist positive constants C, c and ho such that the following holds for all 
step sizes h < ho: every numerical solution (q n ^ n ) starting with frequencies 
u>o = u(I(qo,vo)) such that ||u;o — (J*\\ < c| log/i| _zy_1 , satisfies 


\\(q n ,v n ) - (q(t),v(t))\\ < Cth r 

§I(qn,v n ) - I(qo,v 0 )\\ < Ch r 


for 0 <t = nh < h r . 


The constants ho,c,C depend on d , 7 , v and on bounds of the potential. □ 


XV.7 Practical Considerations 

In computations with multistep methods one can observe resonance phenomena, if 
relatively large step sizes are used. This and the use of variable step sizes are the 
subject of this section. 

XV.7.1 Numerical Instabilities and Resonances 

Soon after Quinlan and Tremaine’s methods were published, however, 
Alar Toomre discovered a disturbing feature of the methods, ... 

(G.D. Quinlan 1999) 

It is a simple task to derive multistep methods of high order. Consider, for example, 
methods of the form (1.8) for second order differential equations y = f(y). Their 
order is determined by the condition (1.9). We choose arbitrarily p(Q such that 
C = 1 is a double zero and the stability condition is satisfied. Condition (1.9) then 
gives 

= P«)/ log 2 c + C((C - If) • 

Expanding the right-hand expression into a Taylor series at ( = 1 and truncating 
suitably, this yields the corresponding a polynomial. If we take 

p(C) = (C - i) 2 (C 6 + C 4 + C 3 + C 2 +1), 


(7.1) 
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Table 7.1. Symmetric multistep methods for second order problems; k — 8 and order r = 8 



SY8 

SY8B 

SY8C 

i 

Oii 

12096 pi 

Oii 

120960 Pi 

Oii 

8640 Pi 

0 

1 

0 

1 

0 

1 

0 

1 

-2 

17671 

0 

192481 

-1 

13207 

2 

2 

-23622 

0 

6582 

0 

-8934 

3 

-1 

61449 

-1/2 

816783 

0 

42873 

4 

0 

-50516 

-1 

-156812 

0 

-33812 


we get in this way Method SY8 of Table 7.1, a method proposed by Quinlan & 
Tremaine (1990) for computations in celestial mechanics. All methods of Table 7.1 
are 8-step methods, of order 8, and symmetric, i.e., the relations oli = ot^-i and 
Pi = Pk-i are satisfied. Therefore, we present the coefficients only for i < k/2 . 

These methods give approximations y n to the solution of the differential equa¬ 
tion. If also derivative approximations are needed, we get them by finite differences, 
e.g., for the 8th order methods of Table 7.1 we use 

Vn = ^672 (2/n+i — Vn- 1 ) — 168 (2/n +2 — 2/n—2) 

\ v ' •-“/ 

T 82 (t / n _|_3 y n — 3 ) 3 (r/n+4 2/n— 4 ) J • 

We apply this method to the Kepler problem (1.2.2), once with eccentricity e = 0 
and once with e = 0.2, and initial values (1.2.11), such that the period of the exact 
solution is 27 t. Starting approximations are computed accurately with a high order 
Runge-Kutta method. We apply Method SY8 with many different step sizes ranging 
from 27 t/ 30 to 27 t/ 95, and we plot in Fig. 7.1 the maximum error of the total energy 
as a function of 27 r/h (where h denotes the step size). We see that in general the error 
decreases with the step size, but there is an extremely large error for h « 27t/ 60. 
For e / 0, further peaks can be observed at integral multiples of 5 and 6. It is our 
aim to understand this behaviour. 

Instabilities. We put z = q\ + iq^ so that the Kepler problem becomes 

z = tp(\z\)z, ip(r) = —r~ 3 , 

and we choose initial values such that z(t) = e lt is a circular motion (eccentricity 
e = 0). The numerical solution of (1.8) is therefore defined by the relation 

k k 

'52 a jZn +j = h 2 ^2f3jip(\z n+j \)z n+j . (7.3) 

3=0 i=0 

Approximating ^(|2; n+ j|) with 1) = — a; 2 , we get a linear recurrence relation 
with characteristic polynomial 

S(uh, () = p( C) +uj 2 h 2 a( C). 
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Fig. 7.1 . Maximum error in the total energy during the integration of 2500 orbits of the Kepler 
problem as a function of the number of steps per period 


The principal roots of S(uh, () = 0 satisfy £i (cjfi) & e lujh and (uh) « e ~ lujh , 
and we have |£j(a;/i) | = 1 for all j and for sufficiently small h, because the method 
is symmetric (Exercise 2). As a consequence of \(i(u;h)\ = 1, the values z n := 
Ci(o;/i) n are not only a solution of the linear recurrence relation, but also of the 
nonlinear relation (7.3). Our aim is to study the stability of this numerical solution. 
We therefore consider a perturbed solution 

Zn = (l(uh) n (l + U n ). 

Using \z n \ = 1 + \{u n + ti n ) + 0(\u n \ 2 ) and neglecting the quadratic and higher 
order terms of \u n \ in the relation (7.3), we get 

k h 2 k 

^ & h f3j )Ci (cj/i)^ / u n _|_jf = ( 1 ) ^ ^ ffi/Ci (u n +j -\-. 

3=0 3=0 

Considering also the complex conjugate of this relation, and eliminating u n +j, we 
obtain a linear recurrence relation for u n with characteristic polynomial 

S(uh, Ci(^ft)C) ' S(uh, Ci(^/i) _ 1 C) + 0 (/i 2 ). (7.4) 

For small h , its zeros are close to Ci(^^) _1 Cj an d Ci(A^)0- Iftwo °f these zeros 
collapse, the 0(h 2 ) terms in (7.4) can produce a root of modulus larger than one, 
so that instability occurs. This is the case, if two roots Q, Q of p(£) = 0 satisfy 

CjCf 1 ~ Cl ~ e 2i “ h , or 

0j - 0i - y - ( 7 - 5 > 

where (j = e l0j and h = 2 tt/N. 

For the Method SY 8 of Table 7.1, the spurious zeros of p(Q have arguments 
± 47 r/ 5 , ±27t/ 5, and ±27 t/ 6. With 0j = 2ir/b and 0i = 27t/ 6, the condition (7.5) 
gives TV = 60 as a candidate for instability. This explains the experiment of Fig. 7.1 
for e = 0. A study of the stability of orbits with eccentricity e / 0 (see Quinlan 
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Fig. 7.2. Maximum error in the total energy during the integration of 2500 orbits of the Kepler 
problem as a function of the number of steps per period 


1999) shows that instabilities can also occur when 4tt/N is replaced with 2qir/N 
(q = 2,3,...) in the relation (7.5). 

To avoid these instabilities as far as possible, Quinlan (1999) constructed sym¬ 
metric multistep methods, where the spurious roots of p(Q = 0 are well spread 
out on the unit circle and far from ( = 1. As a result he proposes Method SY 8 B 
of Table 7.1. The same experiment as above yields the results of Fig. 7.2. The p- 
polynomial of Method SY 8 B is 

p{ C) = (C - l) 2 (c 6 + 2C 5 + 3C 4 + 3.5C 3 + 3C 2 + 2C + 1) , 

and the Qj of the spurious roots are ±27 t/2.278, ±27t/ 3.353, and ±27 t/ 4.678. The 
condition (7.5) is satisfied only for N < 23.67, which implies that no instability 
occurs for e = 0 in the region of the experiment of Fig. 7.2. 

To illustrate the importance of high order methods, we included in Fig. 7.2 the 
results of the second order partitioned multistep method (1.15). 

XV.7.2 Extension to Variable Step Sizes 

Variable step size multistep methods for second order differential equations y = 
f(y ) are of the form 

k k 

52 “*•(**">•••> ^n+Zc-l) Vn+j — h n +k_l Y J ! 3 A h n;---, ^n+Zc — 1 ) / {Vn+j ) 5 
3 =0 3=0 

where the coefficients ctj and /3j are allowed to depend on the step sizes h n ,..., 
h n+ k- 1 , more precisely, on the ratios h n+ i/h n ,..., h n+ k-i/h n +k- 2 - They yield 
approximations y n to y(t n ) on a variable grid given by £ n+ i = t n + h n . Such a 
method is of order r (cf. formula (1.9)), if 

k k 

XA (/ ''. hn + k 0y(t n+j ) = /i 2 +fc -i ^2 f3j {Jl n i • • • 5 ^n+Zc — 1 ) y(tn-\-j ) 

3=0 3=0 


(7.6) 
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for all polynomials y(t) of degree < r + 1. It is stable , if the p-polynomial with 
coefficients otj (h ,..., h) (constant step size) satisfies the stability condition of 
Sect. XV.1.2 (see Theorem III.5.7 of Hairer, Nprsett & Wanner (1993) and Cano 
& Duran (2003a)). 

All methods of Sect. XV.7.1 can be extended to symmetric, variable step size 
integrators. This has been discovered by Cano & Duran (2003b). For clarity of no¬ 
tation we let otj , (3j (j = 0,..., k) be the coefficients of such a fixed step size 
method. Cano & Duran propose putting 

0j(h n , K +k - 1) = A_ (7.7) 

^n+Zc — 1 

and to determine ay(/i n ,..., h n +k-i) such that symmetry and order k — 2 (for 
arbitrary step sizes) are achieved. We also suppose (7.7), but we determine the 
coefficients aj(h n ,..., h n +k- 1 ) such that (7.6) holds for all polynomials y(t) of 
degree < k. This uniquely determines these coefficients whenever h n > 0 ,..., 
h n +k-i > 0 (Vandermonde type system) and gives the following properties. 

Lemma 7.1. For even k, let (a.j,Pj) define a symmetric, stable k-step method 
(1.8) of order k, and consider the variable step size method given by (7.7) and 
aj(h n ,..., hn+k- 1 ) such that (7.6) holds for all polynomials y satisfying deg y < 
k. This method extends the fixed step size formula, i.e., 

a j(h, ...,h) = aj, (3j(h, ...,h)= fa, 

it satisfies the symmetry relations 

OLj (h n 5 • • • 5 ^n+/c —l) tX,k—j(h n -\-k — 1? • • • 1 ^n) 

h n +k_i fijifim • • • 5 hn+k — l) = h n Pk—j {fin+k — l •> • • • 5 ^n) 5 

and it is of order k — l for arbitrary step sizes. Moreover, it behaves like a method 
of order k, if h n +1 = h n (l + 0(h n )) uniformly in n. 

Proof. The relation (7.8) for /3j follows at once from (7.7), and for aj it is a conse¬ 
quence of the uniqueness of the solution of the linear system for the ay. 

The second condition of (7.9) follows directly from (7.7) and from the sym¬ 
metry of the underlying fixed step size method ((3k-j = Pj for all j). Inserting 
(7.7) into (7.6), replacing y(t) with y(t n +k + t n — t), and reversing the order of 
h n ,.. .,h n + k -i yields 

k k 

(hn+k—h ••• 5 h n ) y(t n -\-h—j ) — h n h n -\-k—\ Pj y(f n -\-k—j'). 

3=0 3=0 

Using pk-j = Pj this shows that ak-j(h n +k-1, • • •, h n ) satisfies exactly the same 
linear system as aj(h n ,..., h n+ k- 1 ), so that also the first relation of (7.9) is veri¬ 
fied. 


(7.8) 


(7.9) 
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By definition, the variable step size method is at least of order k — 1. Under the 
assumption h n +1 = h n (l + 0{h n )) the defect in (7.6) is of the form 

h k n +1 D(h n , . . . , hn+k-l) = h k n +1 D(h n , ...,h n )+ 0(h k + 2 ) 

for all sufficiently smooth y(t). Since the constant step size method is of order k, 
the expression D(h n ,..., h n ) is of size 0(h n ), so that we observe convergence of 
order k. □ 

The symmetry relation (7.9) has the following interpretation: if the approxi¬ 
mations y n ,..., y n +k -1 used with step sizes /i n ,..., h n +k-i yield y n +k> then the 
values y n +k, • • •, y n +1 applied with h n+ k- 1 , • • •, h n yield y n as a result (since the 
coefficients ay and (3j only depend on step size ratios and the multistep formula 
only on h^ +k _ l9 the same result is obtained with —h n+ k- l, • • •, —h n ). This is the 
analogue of the definition of symmetry for one-step methods. 

For obtaining a good long-time behaviour, the step sizes also have to be chosen 
in a symmetric and reversible way (see Sect. VIII.3). One possibility is to take step 
sizes 

hn+k -1 = ^(a(y n+k - i) +cr(2/n+fc)), (7.10) 

where e > 0, and a(y) is a given positive monitor function. This condition is an 
implicit equation for h n +k- 1 , because y n +k depends on h n +k-i - It has to be solved 
iteratively. Notice, however, that for an explicit multistep formula no further force 
evaluations are necessary during this iteration. Such a choice of the step size guar¬ 
antees that whenever h n +k-i is chosen when stepping from y n ,..., y n +k -1 with 
/i n ,..., h n +k -2 to y n+ i c , the step size h n is chosen when stepping backwards from 
Vn+k-) • • • 5 Z/n+1 with Hn+k — i, . . . , /^n+1 to y n . 

Implementation. For given initial values yo,yo, the starting approximations yi, 
... ,yk- i should be computed accurately (for example, by a high-order Runge- 
Kutta method) with step sizes satisfying (7.10). The solution of the scalar nonlin¬ 
ear equation (7.10) has to be done carefully in order to reduce the overhead of the 
method. In our code we use h n +k-i := + k _ 2 /h n +k -3 as predictor, and we apply 
modified Newton iterations with the derivative approximated by finite differences. 

The coefficients oy(/i n ,..., h n +k- 1 ) have to be computed anew in every itera¬ 
tion. We use the basis 


i — 1 

Pi(t) = Y[(t-t n+j ), i = 0, ...,k 
j =0 

for the polynomials of degree < k in (7.6). This leads to a linear triangular system 
for Q(o, • • •, OLk- As noticed by Cano & Duran (2003b), the coefficients Pi(tj) and 
pi(tj ) can be obtained efficiently from the recurrence relations 

Po(t) = 1 , Pi+l(t) = (t - ti)pi(t) 

Po(t) = 0 , Pi+i(t) = (t - ti)pi(t) + Pi(t) 

Po(t) = 0 , Pi+i(t) = {t- U)pi(t ) + 2 pi(t). 
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Fig. 7.3. Maximum error in the total energy during the integration of 2500 orbits of the Kepler 
problem as a function of the number of steps per period 


During the iterations for the solution of the nonlinear equation (7.10) only the values 
of Pi(t n +k) have to be updated. 

Numerical Experiment. We repeat the experiment of Fig. 7.1 with the method 
SY8, but this time in the variable step size version and with a(y) = \\y\\ 2 as step 
size monitor. We have computed 2500 periods of the Kepler problem with eccen¬ 
tricity e = 0.2, and we have plotted in Fig. 7.3 the maximal error in the Hamiltonian 
as a function of the number of steps per period (for a comparison we have also 
included the result of the fixed step size implementation). Similar to (7.2) we use 
approximations y n that are the derivative of the interpolation polynomial passing 
through y n , y n ±i^Vn± 2 ^ • • • such that the correct order is obtained. The computa¬ 
tion is stopped when the error exceeds 10 -2 . 

As expected, the error is smaller for the variable step size version, and it is seen 
that the peaks due to numerical resonances are now much less although they are 
not completely removed. For large step sizes, the performance deteriorates, but this 
is not a serious problem, because these methods are recommended only for high 
accuracy computations. 

It should be remarked that the overhead, due to the computation of the coeffi¬ 
cients ocj and the solution of the nonlinear equation (7.10), is rather high. Therefore, 
the use of variable step sizes is recommended only when force evaluations f(y) are 
expensive or when constant step sizes are not appropriate. Cano & Duran (2003b) 
report an excellent performance of symmetric, variable step size multistep methods 
for computations of the outer solar system. 

Despite the resonances and instabilities, then, symmetric methods can 
still be a better choice than Stormer methods for long integrations of plan¬ 
etary orbits provided that the user is aware of the dangers. 

(G.D. Quinlan 1999) 
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XV.8 Multi-Value or General Linear Methods 

General linear methods is a class of integration methods that covers Runge-Kutta 
as well as multistep methods. It is therefore of interest to study which of the results 
on the long-time behaviour can be extended. 

So-called multi-value or general linear methods are defined by V n +i = Gh (X n ), 
where 

Y n +i = DY n + hBf(U n+ 1) 

U n -\-i = CY n + hAf(U n+1 ) U ' 

with f(U n+ 1 )= (/« +1 ),.-.,/« +1 )) T for U n+1 = < +1 ) T , and 

Y n = ..., y%). We use a sloppy notation in the sense that the matrices 19, ... 

should be replaced with 19(8)1,5(8)/,... . For a computation, a starting procedure 
Sh and a finishing procedure F h , which extracts the numerical approximation y n 
from Y n , have to be added (see Fig. 8.1). We assume the existence of a vector e such 
that with 1 = (1,..., 1) T 

19e = e, Ce = l (8.2) 

holds (preconsistency conditions). The vector Y n is then an approximation to ey(t n ) 
(more precisely to e (8) y(t n )). 

For Runge-Kutta methods, D = (1) is the one-dimensional identity, B = 
(bi,..., b s ), C = 1, and A is the usual Runge-Kutta matrix. For multistep methods, 
we have Y n = (y n +k- i, ..., y n ) T , and D is the k x k matrix with characteristic 
polynomial p(Q as in (2.1). For a detailed treatment of general linear methods we 
refer the reader to Chap. 4 of the monograph of Butcher (1987), and to Chap. III.8 
of Hairer, Nprsett & Wanner (1993). 



Fig. 8.1. Illustration of a multi-value method Y n +1 m Gh(Y n ) with starting procedure Sh 
and finishing procedure Fh 


XV.8.1 Underlying One-Step Method and Backward Error 
Analysis 

In analogy to multistep methods, a method (8.1) is called strictly stable, if all eigen¬ 
values of D are inside the unit circle with the exception of the single eigenvalue 
( = 1. An extension of Kirchgraber’s result (Theorem 2.1) to strictly stable general 
linear methods is given by Stoffer (1993). 
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Theorem 8.1. Consider a strictly stable gen¬ 
eral linear method Y n + 1 = Gh(Y n ), and a 
finishing procedure y n = Fh(Y n ) = d T Y n + 

0(h). Assume that (8.2) and d T e = 1 hold. 

(i) Then there exist a unique one-step 
method <Ph{y) and a unique starting proce¬ 
dure Sh(y) such that Gh ° Sh = Sh ° @h 
and Fh o Sh = Id hold. 

(ii) The manifold M h = {Sh(y ); y G is invariant under Gh, and it is 
exponentially attractive. 

Proof. Since the method is strictly stable, there exists a matrix T such that 

T~ 1 DT=( 1 0 ^ ) with HA, || < 1, 

and Te\ — e (where e\ = (1,0,..., 0) T ). The proof closely follows that of The¬ 
orem 2.1. With the transformation (£ n ,?7n) T = Z n = T~ x Y n , the general linear 
method (8.1) becomes 

(t;i)=(^.) + ' ,r " lB/<c, " +,) - (83> 

with U n +1 = CTZ n + hAf (U n +i). As before, Theorem XII.3.1 can be applied and 
yields the existence of an attractive manifold Afh = {(£,<$(£)); £ £ ^ d }, which is 
invariant under the mapping (8.3). We now invert the restriction of Fh onto the 
manifold ADue to d T e = 1 and Tei = e, we have for Z = Z(£) = (£, s(£)) T 
that 

y = F h (TZ(0) = d T TZ(C) + ...=£ + <?(£), (8.4) 

where g(£) is Lipschitz continuous with constant 0(h). By the Banach fixed-point 
theorem the equation (8.4) has a unique solution £ = r(y). Putting 

«„(») = 3’Z(r(»))=T( s( ;W ) ), 

we have found the unique starting procedure satisfying Fh o Sh = Id and 
T-'Shiy) € M h - We finally define $ h = F h oG h oS h and M h = {TZ-,Ze N h }, 
so that all statements of the theorem are verified. □ 

It is our aim to extend the concept of an underlying one-step method to nearly 
all (including weakly stable) general linear methods. 

Theorem 8.2. Consider a general linear method (8.1), and assume that £ = 1 
is a single eigenvalue of the propagation matrix D. Furthermore, let Gh(Y) and 
Fh(Y) = d T Y + ... have expansions in powers of h, and assume that (8.2) and 
d T e = 1 hold. Then there exist a unique formal one-step method 
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®h{y) = y + hd±(y) + h 2 d 2 {y) + ■■■ 

and a unique formal starting procedure 

Sh(y ) = ey + hSi(y) + h 2 S 2 {y) + ..., 

such that formally Gh ° Sh = Sh ° @h and F^o Sh = Id hold. 

Proof Expanding Sh(@h(y)) and Gh(Sh(y)) into powers of h, a comparison of 
the coefficients yields 


ed j (y) + (I-D)S j (y) = ... , (8.5) 

where a right-hand side depends on known functions and on di (y) , Si ( y ) with i < j. 
Similarly, the condition F\ (S y (y)) = y leads to 

d T Sj(y) = ... . (8.6) 

Due to the fact that ( = 1 is a single eigenvalue of D , and that d T e 0, the system 
(8.5)-(8.6) uniquely determines dj(y) and Sj(y). □ 

Backward Error Analysis for Smooth Numerical Solutions. The formal analy¬ 
sis of Chap. IX can be directly applied to the underlying one-step method of The¬ 
orem 8.2. This yields a modified differential equation, but only for the smooth nu¬ 
merical solution (cf. Sect. XV.3.1). Notice that this modified equation depends on 
the choice of the finishing procedure F^. 


XV.8.2 Symplecticity and Symmetry 

Before giving a precise meaning to the symplecticity and symmetry of general linear 
methods, we establish the following lemma. 

Lemma 8.3. For a general linear method V n +i = Gh{Y n ) we consider two differ¬ 
ent finishing procedures y n = Fh(Y n ) and y n = Fh(Y n ) : 


s h 


— &h — &h — &h 

yo -i y i -* V2 -* 

{]* 1 ^ 1 ^ 

Y Gh -\r Gh -\r Gh 

o -* Y i -> Y 2 - $ 

l U „ i F -„ i F -„ 


* u f * 

yo 




y i -* 2/2 


The two corresponding one-step methods @h(y) and <Th{y) (given by Theo¬ 
rem 8.2) are then conjugate to each other, i.e., 

otp °$ h oa h =$ h 


with a h = F h o S h . 


(8.7) 
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Proof. The equations involving the underlying one-step methods or the starting pro¬ 
cedures have to be understood in the sense of formal series. By Theorem 8.2 we have 
Sh{y) = ey + Ofh ) and also Sh(y) = ey + 0(h). It thus follows from i^o Sh = Id 
that ah(y) is 0(h) -close to the identity and therefore invertible. □ 

The transformation ah in the phase space is 0{h) -close to the identity. The 
relation a^ 1 o F 7 ^ o ah = F 7 ^, which is a consequence of (8.7), therefore implies 
that the numerical solutions of Fh and Fh remain O(h) -close for all times. This 
means that the long-time behaviour of both methods is exactly the same. 

Consequently, for a given general linear method Gh, it is sufficient to require 
symplecticity or symmetry for one finishing procedure only. 

Definition 8.4 (Symplecticity). A general linear method Gh is called symplectic 
if there exists a finishing procedure Fh such that the underlying one-step method 
Fh of Theorem 8.2 is symplectic, i.e., F' h (y) T J F' h (y) = J in the sense of formal 
series. 

The study of symplecticity of linear multistep methods (Sect. XV.4.1) was rather 
disappointing. We could not find one linear multistep method whose underlying one- 
step method is symplectic. For general linear methods, some necessary conditions 
for the symplecticity of the underlying one-step method are known which are hard 
to satisfy (Hairer & Leone 1998). For the moment, no symplectic general linear 
method (not equivalent to a one-step method) is known, and we conjecture that such 
a method does not exist, even in the class of partitioned general linear methods 
(treating the p and q variables by different methods). 

After the disappointing non-existence conjecture of symplectic multi-value meth¬ 
ods, we turn our attention to symmetric methods. We know from the previous chap¬ 
ters that for reversible Hamiltonian systems, the long time behaviour of symmetric 
one-step methods can be as good as that for symplectic methods. There are sev¬ 
eral definitions of symmetric general linear methods in the literature. However, they 
are either tailored to very special situations (e.g., Hairer, Nprsett & Wanner 1993), 
or they do not allow the proof of results that are expected to hold for symmetric 
methods. 

Definition 8.5 (Symmetry). A general linear method Gh is called symmetric if 
there exists a finishing procedure Fh such that the underlying one-step method Fh 
of Theorem 8.2 is symmetric, i.e., F_h{y) = Ff 1 ^) in the sense of formal series. 

Example 8.6. Consider the trapezoidal method in the role of Gh and the explicit 
Euler method with step size — 7 h as finishing procedure: 

Gh : ^n+i = Y n + \ (/(y n ) + f(Y n+ 1 )) 

Fh • Vn-h 1 = ^n+1 T^/(^n+l) 

The corresponding starting procedure and underlying one-step methods are then the 
implicit Euler method and the following 2-stage Runge-Kutta method: 
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S h : 

Y n =y n + 7 hf(Y n ) 

7 

7 

$h ■ 

Runge-Kutta method 

1+7 

1/2 + 7 l / 2 


1/2 + 7 1/2-7 


The method &h is symmetric only for 7 = 0 , for 7 = 1/2, and for 7 = —1/2. 
This example demonstrates that the symmetry of the underlying one-step method 
strongly depends on the finishing procedure. 

On the other hand, this example shows that the 2-stage Runge-Kutta method 
is symmetric in the sense of Definition 8.5 for all 7 (because it is conjugate to the 
trapezoidal rule). It is not symmetric according to the definition of Chap. V. 

A Useful Criterion for Symmetry. Definition 8.5 is rather impractical for verifying 
the symmetry of a given general linear method. We give here algebraic conditions 
for the coefficients A, B, G, D of a general linear method (8.1), which are sufficient 
for the method to be symmetric. We assume that the finishing procedure y n +1 = 
F h (Y n+ 1 ) is given by 

y n +i = DY n+1 + hBfiV n u), V n +i = CY n+ \ + hAf(V n+1 ), (8.8) 
in complete analogy to method ( 8 . 1 ). 

Lemma 8.7 (Adjoint Method). Let V n +i = Gh(Y n ) be the general linear method 
given by A, B,C,D (with invertible D), y n +i = Fh(Y n +i) the finishing procedure 
given by A, B,C , D, and denote by its underlying one-step method. Then, the 
underlying one-step method of 

G* : A* mCD^B - A, B* = D~ 1 B, C*=CD~ X , D* = D~ x 

F* : 1 * = - 2 , ir = - B , G* = G, D* = D 

is the adjoint method of <T>h- 

Proof Substituting h —h and Y n+ 1 Y n in (8.1) yields 

Un +1 = CT n+ i - hAf{U n + 1 ), Fn = DY n+ 1 - hBf(U n +i). 

Extracting Y n + 1 from the second relation and inserting it into the first gives 

u n+1 = CD~ 1 Y n + h(CD~ 1 B - A)f(U n+ i) 

Y n +i = D~ l Y n + hD~ 1 Bf(U n+1 ), 

which is exactly method G^. The same replacements in the finishing procedure 
V n+1 = CY n - hAf(V n . i). y n = DY n - hBf(V n+1 ) 
and in the diagram of Theorem 8.2 prove the statement. □ 
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Theorem 8.8. If there exist an invertible matrix Q (satisfying Qe = e with e given 
by (8.2)) and a permutation matrix P such that 

P~ 1 AP = CD '/i-.’- .l. Q-'BP = D~ l B , 

P~ l CQ = CD~\ Q~ X DQ = D~\ l8 ' 9> 

then the general linear method (8.1) is symmetric. 

Proof. We consider the change of variables Y n = Q Y n , U n = P U n in the method 
(8.1). Since P is a permutation matrix, we have f(PU ) = Pf(U ), so that the 
method becomes 

PUn+1 = CQY n + hAPf(U n+1 ), QY n+1 = DQY n + hBPf(U n+1 ). 

The assumption (8.9) implies that this method is the same as the adjoint method 
of Lemma 8.7. Taking a finishing procedure Fh in such a way that y n +% = 
Fh{QYn+ 1 ) is identical to the finishing procedure y n +i = F^(Y n+1 ) of the ad¬ 
joint method (i.e., B = 0 and D such that DQ = D ), we obtain = <P h . This 
proves the statement. □ 

The sufficient condition of Theorem 8.8 reduces to the known criteria for clas¬ 
sical methods. Let us give some examples: 

• For Runge-Kutta methods we have D = (1), B = b T a row vector, and C = 1. 
With Q = (1) and P the permutation matrix that inverts the elements of a vector, 
we get 

b T P = b T , PAP = lb T - A, 
which is the same (V.2.4). 

• Multistep methods in their form as general linear methods (Sect. XV. 8) satisfy the 
condition of Theorem 8.8 if 

OLi = -Qfc-i, f3i = Pk-i. (8.10) 

One can take for P and Q the permutation matrices (inverting the elements of a 
vector) of dimension k + 1 and k , respectively. 


XV.8.3 Growth Parameters 

For a rigorous study of the long-time behaviour of general linear methods it is not 
sufficient to investigate smooth numerical solutions. One has to get bounds on the 
parasitic solution components, which are present when one considers the general lin¬ 
ear method without any starting and finishing procedure. This is certainly difficult, 
as it is for multistep methods (1.1). We restrict here our analysis to the linearized 
parasitic modified equation. 

The eigenvalues of the matrix D in (8.1) will play the role of the zeros of p(Q in 
(1.1). We denote them by £i = 1 and ( 2 ,..., 0c> an d we assume that they are simple 
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and of modulus one. Motivated by the analysis for multistep methods we write the 
approximations Y n as 

Y n = Y(nh) + Y CeZ £ (nh) (8.11) 

£ei* 

with smooth functions Y(t) and Zi(t). The index setZ* has the same meaning as in 
Sect. XV.3.2. We insert (8.11) into (8.1) and compare coefficients of Q. This gives 
with t = nh 

Y(t + h) = DY(t) + hBf(CY(t)) + 0(h 2 ) 

( 8 . 12 ) 

QZ,(t Y h) = DZ £ (t) + hBf'(CY(t))CZ £ {t)YO(h 2 ). 

To get an amenable form of the modified equations we write the vectors Y (£), Z^{t) 
in the basis of eigenvectors of D , which we denote by w\ = e and W 2 ,..., Wk m - 

k k 

Y (t ) = Y 1 w w 3 > (9 = X Ze ’i W w i ■ 

j =i j'=i 

Inserted into (8.12) and expanded into a series of h yields 

vi = fivi) + o(h)> 

and algebraic relations of the form yj(t) = 0{h) for j > 2. Similarly, we get 
algebraic relations for Z£j(t) = 0(h ) if j ^ £, and the function zi(t) := Z£^(t) 
satisfies 

h= Hif\yi)zi + 0(h) with M = (l 1 w*BCw j , (8.13) 

where re* is the left eigenvector of D corresponding to the eigenvalue Q. This is in 
perfect analogy to the computations of Sect. XV.5.1. 

This analysis can be extended straightforwardly to partitioned general linear 
methods, where different methods are applied to the components y and v of a parti¬ 
tioned differential equation. Unfortunately, we do not know of any results that would 
extend those of Sect. XV.6 to general linear methods. 


XV.9 Exercises 

1. Let Ci( z ) be the principal root of the characteristic equation p(£) — za(() = 0. 
Prove that for irreducible multistep methods the condition (j (—z)( i (z) = 1 (in 
a neighbourhood of z = 0) is equivalent to the symmetry of the method. 

2. (Lambert & Watson 1976). Prove that stable, symmetric linear multistep meth¬ 
ods (1.8) for second order differential equations, for which the polynomial p(() 
has only simple zeros (with the exception of ( = 1), has a non-vanishing inter¬ 
val of periodicity, i.e., the roots Ci( z ) °f p(C) ~ z<2(7 {() = 0 satisfy \(i(iy)\ = 1 
for sufficiently small real y. 

Hint. Simple roots cannot leave the unit circle under small perturbations of y. 
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3. Consider a symmetric, s-stable multistep method (1.8). If it is irreducible (no 
common factors of p(Q and cr(£)), then k is even. Hence p(— 1) ^ 0. 

4. Using Theorem XII.3.2, prove that the underlying one-step method of a strictly 
stable rth order linear multistep method has order r. 

5. (Dahlquist 1959). Consider the linear problem y = Xy and apply a symmetric 
linear multistep method (1.1) as in Example 2.2. Prove that for t = nh and 
ft -► 0, 

Cj(Xh) » C”e^ At , 

where pj is the growth parameter. 

6. Consider a general linear method (8.1). If there exist an invertible symmetric 
matrix G and a diagonal matrix A such that 


( D T GD - G D t GB - C T A \_ 

V B T GD - AC B T GB - A T A - AA) ~ U ’ 


(9.1) 


then the method is G-symplectic. 

Hint. Adapt the proof of Burrage & Butcher for 5-stability (see Hairer & Wan¬ 
ner (1996), page 358). 

7. A Runge-Kutta method can be considered as a general linear method with D = 
(1), C = 1. Prove that the condition (9.1) is equivalent to the symplecticity 
condition of Chap. VI. 

8. Extend the definition of G-symplecticity to partitioned general linear methods, 
and prove that the condition 


( D T GD - G D t GB - C T A \ 
V B T GD - AC B T GB - A T A - AA) 


(9.2) 


implies that the method is G-symplectic. 

9. Construct general linear methods of order r > 2, for which all growth parame¬ 
ters are positive. Find such methods, which have a smaller degree of implicit¬ 
ness than symmetric one-step methods of the same order. 

10. Write a Maple program that checks the coefficients of Table 7.1. After defining 
rho : =p(z ), use the instructions 

> sigma := taylor(rho/(log(z)*log(z )), z=l , 8); 

> factor(expand(convert(sigma,polynom))); 

11. Construct partitioned general linear methods which are symmetric, explicit, of 
high order, and for which the matrices D and D have distinct eigenvalues (with 
the exception of 1). Compared to multistep methods, smaller dimensions of the 
matrices D and D are possible. 
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first integrals 5,97,211, 375 

- long-time near-preservation 413,448 

- quadratic 212,591 
fixed-point iteration 330 
flow 2 

- discrete 3 

- exact 2,49,200 



Index 639 


- isospectral 107 

- numerical 3,49 

- Poisson 261,265 
frequencies 399 

- diophantine 406 
Frobenius norm 132 

G-symplectic 587 
Gauss methods 34,101, 333 

- symmetric 147 

- symplectic 192 
Gaussian wavepacket 296 
Gautschi-type methods 473,477 
general linear methods 609 

- strictly stable 609 

- symmetric 611 

- weakly stable 610 

generalized coordinate partitioning 117 
generating functions 195,197,201,204, 
288, 344 

- for partitioned RK methods 199 

- for Runge-Kutta methods 198 
geometrical numerical algebra 131 
GL(n), general linear group 119 

Q [(n), Lie algebra ofnxn matrices 119 
Grassmann manifold 131,135 
growth parameter 592,614 

Henon-Heiles problem 380 
Hall set 78 

Hamilton’s principle 204,205 
Hamilton-Jacobi equation 200,391 
Hamiltonian 4,181,257 

- elementary 373,384 

- global 186 
-local 185,234 

- modified 343,375 

Hamiltonian perturbation theory 389,404 

- basic scheme 405 

- Birkhoff normalization 412 

- KAM theory 410,423 

- perturbation series 406 
Hamiltonian systems 4,180 

- constrained 239,258,289 

- integrable 390 

- non-canonical 237 

- perturbed integrable 404 
harmonic oscillator 

- varying frequency 546 
heavy top 283 
Henon-Heiles model 15 
Hopf algebra 65 

IE 150 


implementation 303,325 
implicit midpoint rule 3, 34,190,192, 
223,270 

- averaged 537 

- symmetry 145 

- symplecticity 190 
impulse method 317,475,550 

- mollified 476 

index reduction 239,241 
inertia ellipsoid 275 
integrability lemma 186 
integrable systems 601 

- Hamiltonian 390 

- reversible 437 
invariant manifold 574 

- attractive 460,574, 610 
invariant torus 397,423 

- long-time near-preservation 422,451 

- of numerical integrator 433,453,467 

- of reversible map 451 

- of symplectic map 431 

- weakly attractive 464 
invariants 2,5,97 

- adiabatic 531,533,545,562 

- linear 99 

- polynomial 105 

- quadratic 101 

- weak 109 
involution 

- first integrals in 391 
irreducible 

- Runge-Kutta methods 220 
isospectral flow 107,403 
isospectral methods 107 
iteration 

- fixed-point 330 

- Newton-type 331 

Jacobi identity 118,255 
KAM theory 

- Hamiltonian 410,423 

- reversible 445 

- reversible near-identity map 451 

- symplectic near-identity map 431 
KAM torus 

- sticky 412 

Kepler problem 8,25,46,111,150,234, 
416, 603 

- perturbed 12,26, 304 
Kepler’s second law 9 
kernel 

- of processing methods 158 
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kinetic energy 180,237 
Kolmogorov’s iteration 410 
Kolmogorov’s theorem 423 

Lagrange equations 181 

Lagrange multipliers 111,132,237,279 

Lax pair 403 

leap-frog method 7 

left-invariant 289 

Legendre transform 181 

- discrete 206 
Leibniz’ rule 255 
Lennard-Jones potential 19 
Lie algebras 118,286 

Lie bracket 89,118,261 

- differential operators 89 
Lie derivative 87, 348, 362 

- ofB-series 370 

- of P-series 382 

Lie group methods 123,351 

- symmetric 169 
Lie groups 118 

- quadratic 128 

Lie midpoint rule 127 

Lie operator 261 

Lie-Euler method 126 

Lie-Poisson reduction 289 

Lie-Poisson systems 274,286 

Lie-Trotter splitting 47 

Lindstedt-Poincare series 406 

linear error growth 12,413,414,448, 601 

linear multistep methods 

- weakly stable 575 
linear stability 23 
Liouville lemma 392 
Liouville’s theorem 227 

Lobatto IIIA - IIIB pair 102,192,210, 
247, 352, 386 

Lobatto IIIA methods 34,377 

- symmetric 147 
Lobatto IIIA-IIIB pair 40 
Lobatto IIIB methods 37, 377,449 

- symmetric 147 
Lobatto IIIS 235 
Lobatto quadrature 247 
local coordinates 113 

- existence of numerical solution 167 

- symmetric methods 166 
local error 29 

- of composition methods 150,176 
long-time behaviour 

- symmetric integrators 437,455 

- symplectic integrators 389,455 


long-time energy conservation 366 
Lorenz problem 176 
Lotka-Volterra problem 1,24,175,257, 
270,271,273,340 
low-rank approximation 137 
Lyapunov exponents 131 

Magnus series 121 
manifold of rank k matrices 131 
manifolds 109,114,239,267 

- symmetric methods 161 

- symplectic 258 
Marchuk splitting 47 
matrix commutator 83 
matrix exponential 120 
matrix Lie groups 118 
mechanical systems 555 

- constrained 237,258 
merging product 75 

methods based on local coordinates 166 
methods on manifolds 97, 350 

- symmetric 161 
midpoint rule 123 

- explicit 569,580 

- implicit 3, 34,190,192,223,270 

- Lie 127 

- modified 171 

modified differential equation 337 

- B-series 369 

- constrained Hamiltonian system 352 

- first integrals 351 

- Lie group methods 351 

- Lie-Poisson integrators 354 

- methods on manifolds 350 

- P-series 381 

- perturbed differential equation 466 

- Poisson integrators 347 

- reversible methods 343 

- splitting methods 348 

- symmetric methods 342 

- symplectic methods 343 

- trees 369 

- variable steps 356 
modified equation 

- parasitic 579 

modified Hamiltonian 343, 375,589 

- global 344, 353 
modified midpoint rule 171 
modulated Fourier expansion 496 

- exact solution 486,496 

- Hamiltonian 503 

- multi-frequency 519 

- numerical solution 488,498 
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molecular dynamics 18 
mollified impulse method 476,554 
momenta 181 

- conjugate 181 

- discrete 206 
moments of inertia 100 
momentum 

- angular 9,98,100,101,173 

- linear 98,173 
momentum conservation 600 
Moser-Veselov algorithm 281 
multi-force methods 478 
multi-value methods 609 

- symmetric 611 
multiple time scales 472,479 
multiple time stepping 316,475 
multirate methods 316 
multistep methods 567 

- backward error analysis 576 

- G-symplectic 587 

- partitioned 572 

- second order equations 569 

- strictly stable 568,573 

- symmetric 568,570 

- symplectic 585 

- variable step sizes 605 
Munthe-Kaas methods 125 

N --body system 13,98 

- energy-momentum methods 173 
Newton-type iteration 331 
Noether’s theorem 210 
non-resonant frequencies 406 
non-resonant step size 433,498,511 
Nystrom methods 41,69,96,104 

- symplectic 194 

O(n), orthogonal group 119 
one-leg methods 587 
one-step method 8,29,187 

- underlying 573,609 
optimal control 235 
order 29 

- of a tree 53,67 

- of symmetric local coordinates 167 

- of symmetric projection 162 
order conditions 

- composition methods 71,75, 80,93,94 

- Crouch-Grossman methods 124 

- Nystrom methods 69 

- partitioned RK methods 39,69 

- processing methods 159 

- RK methods 29,51,56,58 


- splitting methods 80,92 

- symmetric composition 155 

- symmetrized 177 
ordered subtrees 60 
ordered trees 60 
oriented area 183 
oriented free trees 388 
orthogonal matrices 118 
orthogonality constraints 131 
oscillatory differential equations 21,471, 

531 

oscillatory energy 22,479,484,505,510, 
517,524 

outer solar system 8,13,112 

P-series 68,214 

- symplectic 217,219 
parametrization 

- tangent space 117 
partial differential equations 

- linear 262 

partitioned Runge-Kutta methods 38, 
102,148 

- diagonally implicit 149 

- symmetric 148 

- symplectic 193,208,231 
partitioned systems 3,66 
pendulum 4,5,110,181,185,188,367, 

396,593 

- double 233 

- spherical 238,254 

- stiff spring 526 
perturbation series 

- averaging 459 

- Hamiltonian 406 

- reversible 444 
perturbation theory 

- dissipative 455 

- Hamiltonian 389,404 

- reversible 437 
phase space 2 
Poincare cut 16 
Poisson 

- bracket 255,257 

- flow 261,265 

- integrators 270,272, 300 

- maps 268 

- systems 254,257,297 
Poisson structures 265 

- canonical 254 

- general 256 

polar decomposition 134 
polynomial invariants 105 
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potential energy 181,237 
precession 12,26 
processing 

- of composition methods 158 

- order conditions 159 
projection 

- symplectic 259 
projection methods 109,351 

- standard 110 

- Stiefel manifolds 133 

- symmetric 161 

- symmetric non-reversible 166 
pseudo-inverse of a matrix 116 
pseudo-symplectic methods 436 

QR algorithm 108 
QR decomposition 134 
quadratic invariants 101 

- near conservation 225 
quadratic Lie groups 128 
quantum dynamics 293 
quasi-periodic flow 399 
quaternions 281 

r-RESPA method 318,475 
Radau methods 34 
rank k matrix manifold 131 
RATTLE 245,280, 352, 388 
resonance 

- numerical 482,485, 602 
resonance module 517 
reversibility 239,311 

- of symmetric local coordinates 168 

- of symmetric projection 163 
reversible maps 143,144 
reversible methods 343 
reversible perturbation theory 437 

- basic scheme 443 

- Birkhoff normalization 447 

- KAM theory 445 

- perturbation series 444 
reversible systems 143 

- integrable 437 

- perturbed integrable 442 
reversible vector fields 144 
^-compatibility condition 145 
p-reversible 143 

- maps 144 

- vector field 143 
Riccati equation 134 

rigid body 99,163,274,280,288,441, 
449 

- Hamiltonian theory 278 


Rodrigues formula 141 
rooted trees 53 
rounding error 322 
Runge-Kutta methods 27,28,101, 311, 
325, 333 

- ^-compatibility 145 

- additive 50 

- adjoint method 147 

- implicit 29 

- irreducible 220 

- partitioned 38,148 

- symmetric 146 

- symplectic 191,231 
Runge-Lenz-Pauli vector 26 

s-stable 594 
Schrodinger equation 293 

- nonlinear 273 
semiclassical dynamics 293 
separable partitioned systems 231 
SHAKE 245 

simplifying assumptions 96 

sine function 473,481 

singular value decomposition 133 

SL(n), special linear group 119,130 

5 [(n), special linear Lie algebra 119 

small denominators 406 

SO(n), special orthogonal group 119 

5 0 (n), skew-symmetric matrices 119 

spherical pendulum 238,254 

splitting 

- fast-slow 317 

- Lie-Trotter 47 

- Marchuk 47 

- of ordered tree 370 

- Strang 47,230 

splitting methods 47,48,91,193,252, 
270,284,298, 348 

- ^-compatibility 145 

- negative steps 82 

- of higher order 82 

- order conditions 80 
Sp(n), symplectic group 119 
5p(n), symplectic Lie algebra 119 
stability 

- linear 23 

- long-term 592 
stability function 194 
starting approximations 326 

- order 327 
step size control 

- integrating, reversible 314, 357,449, 
538 



Index 643 


- proportional, reversible 310, 313,356, 
449 

- standard 303 

- structure-preserving 310 
step size function 308, 311 
Stiefel manifold 131 
Stormer-Verlet scheme 7,9, 39,48,189, 

270,318,349,386,472,586 

- as classical limit 300 

- as composition method 148 

- as Nystrom method 41 

- as processing method 159 

- as splitting method 48 

- as variational integrator 208 

- energy conservation 368,513 

- linear error growth 414 

- symmetry 42,145 

- symplecticity 48,190 

- variable step size 308, 309, 312,313, 
315 

Strang splitting 47,230, 315, 348 
structure constants 286 
submanifold 109 

- symplectic 259 
subtrees 

- ordered 60 
summation 

- compensated 323 
superconvergence 32,37,250 
Suzuki’s fractals 45,46,153 
switching lemma 76 

symmetric collocation methods 146,176 
symmetric composition 94 

- of first order methods 150 

- of symmetric methods 150,154 
symmetric composition methods 149 

- of order 6 156 

- of order 8 157 

- of order 10 158 

symmetric Lie group methods 169 
symmetric methods 3,42,143,144, 342, 
612 

- explicit 148 

- symmetric composition 154 
symmetric methods on manifolds 161 
symmetric proj ection 161 

- existence of numerical solution 162 

- non-reversible 166 

symmetric Runge-Kutta methods 146, 
176 

symmetric splitting method 177 
symmetrized order conditions 177 
symmetry 289,311,613 


- of Gauss methods 147 

- ofLobatto 147 

- of symmetric local coordinates 168 
symmetry coefficient 57, 67,72 
symplectic 183,196,241 

- B-series 217 

- maps 268 

- P-series 217 

- projection 259 

- submanifold 258,295 

symplectic Euler method 4,48,189,193, 
230,242,270, 340,346, 349, 383 

- as splitting method 48 

- energy conservation 368 

- variable step size 307 
symplectic methods 187,612 

- as variational integrators 207 

- based on generating functions 203 

- irreducible 222 

- Ny strom methods 194 

- partitioned Runge-Kutta methods 193, 
208 

- Runge-Kutta methods 191 

- variable step size 306 
symplectic submanifold 259 
symplecticity 244,585 

Takens chaos 563 
tangent bundle 239 
tangent space 114,120 

- parametrization 117,134 
0-method 147 

- adjoint 148 
three-body problem 321,390 
time transformation 306, 356 
time-reversible methods 144 
Toda flow 109 

Toda lattice 402,414,440,449 
total differential 186,196 
total energy 9,18,21,98,479,484,510, 
524, 600 
transformations 

- adiabatic 531,532 

- averaging 458 

- canonical 186 

- reversibility preserving 438 

- symplectic 182,183,196,241 
trapezoidal rule 28,194,223, 312 
trees 51,217,369 

- bi-coloured 66 

- equivalence class 384 

- ordered 60 

- oriented free 388 



644 Index 


- rooted 53 
cxD-trees 72 

trigonometric methods 473 
triple jump 44,46,153 
true anomaly 9 
two-body problem 9,25 
two-force methods 478 

underlying one-step method 573,609 

Van der Pol’s equation 455 
variational integrators 204 
variational problem 205,237 
variational splitting 271 


vector fields 2 

- divergence-free 227 

- reversible 143,144 

Verlet method 7,39,48,189,270,318, 
472,513 

- adaptive 309 
Verlet-I method 318,475 

volume preservation 105,113,227,231 
volume-preserving integrators 228 

weak invariants 109 
work-precision diagrams 150,153,156, 
157, 334, 336,482,604, 605, 608 
Vk-transformation 235 



